hemberg-lab / SC3

A tool for the unsupervised clustering of cells from single cell RNA-Seq experiments
http://bioconductor.org/packages/SC3
GNU General Public License v3.0
119 stars 55 forks source link

Check random seed #47

Open wikiselev opened 6 years ago

wikiselev commented 6 years ago

Looks like the seed is changed somewhere in the code, so sometimes results cannot be reproduced...

leonfodoulian commented 6 years ago

Hi,

I was about to open an issue related to this problem. I am repeating the sc3 function several times, using the same rand_seed, yet I get very different results. I suspected that it might be related to using several cores (n_cores = 4 in my case). But if I set this value to 1, I still get different results. My full code is the following:

  # Estimate optimal number of clusters using SC3
  sc <- sc3_estimate_k(object = sc)

  # Check estimated optimal number of k 
  # Output is 10
  (k.use <- metadata(sc)$sc3$k_estimation)

  # Add feature_symbol column in rowData(sc)
  rowData(sc)$feature_symbol <- rownames(counts(sc))

  # Perform SC3 clustering
  sc <- sc3(object = sc, ks = k.use, gene_filter = TRUE,
            pct_dropout_min = 10, pct_dropout_max = 90, d_region_min = 0.04,
            d_region_max = 0.07, svm_num_cells = NULL, svm_train_inds = NULL,
            svm_max = 5000, n_cores = 4, kmeans_nstart = 1000,
            kmeans_iter_max = 1e+09, k_estimator = FALSE, biology = TRUE,
            rand_seed = 1)

Any idea when this issue can be fixed?

Best, Leon

wikiselev commented 6 years ago

Hi @leonfodoulian , thanks! This is a known bug, I started working on it but at the moment it is very slow since I moved jobs and doing it at free time. Sorry for the inconvenience.

wikiselev commented 6 years ago

Still working, but now there is another person who is dramatically improving SC3, so hopefully will be fixed soon.

alfonsosaera commented 6 years ago

Hi,

I am wondering if k estimates vary between runs. Could k estimates be affected by this issue?

Thanks!

wikiselev commented 6 years ago

Hi, no, k estimation algorithm is deterministic and should not change between runs.

lucygarner commented 4 years ago

Hi,

I am also having this problem that multiple runs of SC3 produce different results even when rand_seed is set to a fixed value. Is this still a known problem? I have just re-installed SC3 from Bioconductor (https://bioconductor.org/packages/release/bioc/html/SC3.html).

Best wishes, Lucy

wikiselev commented 4 years ago

Hey, yes, it's still a known problem, but, unfortunately, no one is working on it at the moment.