Closed hukai916 closed 2 years ago
Do you mean for the fragments
function? In fragments
we don't use utils::chunk_bam()
for this reason, we only separate reads based on chromosome for multiprocessing
Good to know, that explains my concern. Thanks.
Hi developers,
I understand that the chunk_bam() function splits the genome into multiple intervals for multiprocessing.
Basically, for each paralleled task, it calls pysam.fetch() to retrieve all the reads that map to the supplied interval. One concern to me is that, if certain reads overlap with more than one "intervals" (thus, will be fetched by pysam more than once from parallel jobs), will those reads be double counted?
Please let me know if this is a valid concern or not based on your experience. Really appreciate it!