sidhomj / DeepTCR

Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data
https://sidhomj.github.io/DeepTCR/
MIT License
113 stars 40 forks source link

Reproducible clustering #36

Closed emm1R closed 3 years ago

emm1R commented 3 years ago

Hi, Is there a way to make the training and clustering reproducible? Setting graph_seed and split_seed in Train_VAE does not seem to do the trick.

sidhomj commented 3 years ago

What are you using to do the clustering? Your own algorithm or the clustering method in DeepTCR?

emm1R commented 3 years ago

The clustering method in DeepTCR.

sidhomj commented 3 years ago

The default method to cluster is the phenograph algorithm, which I do not believe is a deterministic algorithm. You can read more about it here: https://github.com/jacoblevine/PhenoGraph

emm1R commented 3 years ago

Is the training then deterministic if the two seeds are given values?

sidhomj commented 3 years ago

From what I have seen, if the training is done with a GPU, it will never be perfectly deterministic. Some of the gpu tensorflow ops are not deterministic so there can be minor differences from training session to session.