smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
66 stars 27 forks source link

bsrate_sam takes a long time if input has random chrom distribution #156

Closed guilhermesena1 closed 4 years ago

guilhermesena1 commented 4 years ago

bsrate should fail if it sees two non-continuous chromosomes as it will take a very long time to finish processing if reads are not grouped by chromosome

guilhermesena1 commented 4 years ago

fixed with 37a6f01ea8279c394507c0e24b98718778b886b7

mengzhou commented 4 years ago

How about checking for an index file? For BAM you can only index it after sorting, so requiring for an index automatically ensures sorting. I know you are working on SAM, just a thought.

andrewdavidsmith commented 4 years ago

I think the original idea is still the easiest, and probably the only one that will always work -- just specify sorted order is required and report a failure if not. I don't think any other way will work, unless it requires more constraints on the users. This is how most versions have functioned.