plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
162 stars 17 forks source link

Running scDblFinder deterministic and serial with the 'samples' parameter #59

Closed cwoehle closed 2 years ago

cwoehle commented 2 years ago

Hello,

I was trying to run scDblFinder with the samples parameter, set.seed(), but without BPPARAM and noticed that reproducibility was not given (Finding the same number of doublets). Either removing the samples parameter or adding BPPARAM=MulticoreParam(1, RNGseed=seed) produced reproducible results. However, I was searching for a way for serial execution suitable for running in RStudio (I keep having problems with BiocParallel) and needed to consider individual samples. So, after some testing I ended up using BPPARAM=SerialParam(RNGseed = seed), which seems to lead to the behaviour I was looking for. I did not find any comment on SerialParam() in the documentation. Would this also be your suggested solution in my case or could there be a better alternative?

I´m grateful for any clarification.

Best wishes, Christian

plger commented 2 years ago

Hi Christian, thanks for bringing this up, I never noticed because I always run them with multithreading, but you're probably not the only user that will face this. Yes I'd use your solution, and I now added this in the vignette (in the FAQ on reproducibility). Best, plger

julien-roux commented 1 year ago

Hi Pierre-Luc,

Yes I'd use your solution, and I now added this in the vignette (in the FAQ on reproducibility).

Not sure you what you were referring to here, but if this is this section of the vignette, I still find it confusing...

Could you maybe mention that this way of setting the seed should always be used when the samples argument is used, even when the default BPPARAM=SerialParam() is used?

plger commented 1 year ago

you're right, wasn't clear, hope it is now.