streetslab / dimelo

python package for analysis of dimelo-seq & nanopore modified base data
MIT License
3 stars 5 forks source link

parse_bam progress bar #26

Closed thekugelmeister closed 2 years ago

thekugelmeister commented 2 years ago

Added reasonable progress reporting to parse_bam for initial version.

Notes

The current implementation of progress reporting has two versions:

These should be considered stopgap solutions until more consistent/modular parsing, logging, and progress reporting have been implemented.

thekugelmeister commented 2 years ago

Comments and responses from first review:

For the plot_enrichment function (I didn’t see this issue with any of the other functions), I noticed that after it said 100% complete, it still took about a minute to give the output and end the run.

I agree! I've narrowed it down to something within parse_bam, not actually related to plot_enrichment. Unfortunately, it also appears to be within the parallelized step. I haven't checked, but my best guess right now is that the final window for plot_enrichment is for some reason bigger and taking longer? I decided to forego addressing this problem for now, as I felt it wasn't worth fixing it without thinking about possible redesigns.

when it’s processing reads for plot_browser I think it’s missing a space between the # and the word “reads.”

I know, and it annoys me too. It's something weird about the tqdm package. It felt weird to manually add a space before the word "reads" for the argument I'm passing, so I decided to not touch it. If you think it's worth adding it, I'll go ahead and do so.

this is just out of curiosity but for example for plot_enrichment I ran it twice and I noticed it went up by the same percentages each time, i.e 16%, 24%, 32%, 40%, 56%, 64%, 72%, 88%, 100% do you know why this is the case?

That's as expected! This is a side effect of the fact that the progress bar tracks window completion over the parallelized step. By default this winds up using 8 cores If they all took different amounts of time to run, it wouldn't look so weird, but they all kind of finish at the same time so it looks like the bar is jumping by 8 every time. In the future, if we change how progress is tracked, this wouldn't be the case. If you want, you can verify that changing the number of cores in parse_bam changes the multiple by which it goes up.