r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
296 stars 124 forks source link

runDiffusionMaps gives variable results #162

Open BKLi opened 4 years ago

BKLi commented 4 years ago

I have been able to follow the tutorial for 10X 5k ATAC-seq for my own set of ~6k cells after filtering: https://github.com/r3fang/SnapATAC/tree/master/examples/10X_brain_5k

When I runDiffusionMaps multiple times I get variable results.

Sometimes, I get noise, but other times I get clear eigenvalue separation.

66848947-4EAD-43CD-95CC-F4468A9FDDED Screen Shot 2020-02-27 at 3 02 12 PM

x.sp = runDiffusionMaps( obj=x.sp, input.mat="bmat", num.eigs=50 );

This is just with running the same code without any changes. Is there a change in parameters I can use to make this more robust?

Thank you.

bc2zb commented 4 years ago

I'm seeing identical behavior with my own dataset. Setting the seed does not appear to alter the behavior.

Shellloman commented 3 years ago

I found something interesting. In the DiffusionMaps, in trainRegression ( utilities.R ) , there are randomly choose value. "idx.ds <- sort(sample(x = seq(row.covs), size = min(1000, length(row.covs)), prob = sampling_prob));" i change this by "idx.ds <- c(1:length(row.covs))". After that the results are really close to be still the same.