heiniglab / scPower

Experimental design framework for scRNAseq population studies (eQTL and DE)
45 stars 5 forks source link

negative binomial parameters and memory issue #8

Closed Xiaxhan closed 2 years ago

Xiaxhan commented 2 years ago

Hi Katharina,

Thanks for the great method.

I tried to generate priors following your manual and stuck at the nbinom.estimation() step. I already set 'poscounts' and seems invalid. Maybe my data is too spare? Screen Shot 2022-06-09 at 14 13 09

Besides, my dataset is actually scATAC and super large - so I subset the data by one chromosome to try - about 16,000 peaks whose number is comparable to gene number when using rna data, though cell number is stll large. Even doing this, my program is often killed by system due to out of memory. Seems like I need to require >300G memory to run with the big count matrix.

I am wondering if you have any suggestions to resolve these problems.

Thanks, Xia

KatharinaSchmid commented 2 years ago

Hi Xia,

thanks for your feedback. I am worried that our tool will probably not work for scATACseq data. I don't have much experience with scATACseq, so I am not sure if our distribution assumptions hold, e.g. that the counts follow a negative binomial distribution for each peak and that the mean distribution over all peaks can be modelled by a combination of two gamma distributions. Because of this, I would recommend you to not use our tool for scATACseq data, only for scRNAseq data. Sorry, we know that it would be valuable extension, but I assume that adaptions are necessary.

I am still surprised that you run into memory issues, because for scRNAseq I had always very low requirements. The most space should be taken by the dataset itself, downstream you need only two numeric parameters per gene. But maybe this is caused by problems during the fitting, because the distributions do not work for scATACseq data.

Best regards, Katharina