timoast / sinto

Tools for single-cell data processing
https://timoast.github.io/sinto/
MIT License
112 stars 24 forks source link

scaling with bam size and parallel processing #47

Closed rtyags closed 2 years ago

rtyags commented 2 years ago

Hi,

Could you help me with speeding up my sinto runs? How does it scale with data size? I have noticed that it runs for a very long time when we have a large bam file as input. Part of this could also be that at no stage does it seem to use multiple processors even though I have provided a high number with -p option. Is it possible that something was missed during installation so that parallelization is somehow not available to sinto on my system?

Thanks

timoast commented 2 years ago

Which function are you using? Can you show the code you’re running?

rtyags commented 2 years ago

Thanks for your response. The code I was running is: sinto fragments -b atac_possorted_bam.bam -f chrM.fragments.cells.tsv -c cells.tsv -p 31 --use_chrom chrM --collapse_within

I just realized that it is only using a single processor because we are looking at a single chromosome. Is there a way to hack it to use multiple processors in this case? Currently it is taking just too long to run with this bam file. How does sinto scale with the size of the bam file?

timoast commented 2 years ago

We only parallelize across chromosomes, so if you're running on a single chromosome it will be limited to a single core. I haven't done any benchmarking of how runtime relates to file size.

rtyags commented 2 years ago

Thanks