owenjm / find_peaks

A simple FDR peak caller
5 stars 4 forks source link

Use of uninitialized values #3

Open TeddyHuang-00 opened 1 year ago

TeddyHuang-00 commented 1 year ago

I came across this problem when translating this program to Python (here is a link to it), and the details are stated there.

In brief, when the random permutation number N is too small or when the min_count is set too high, fewer lengths of peaks will be identified, and the regression will skip those peak min values. But in the following function, these uninitialized values will still be used to calculate for an expectation number of occurrence, thus leading to problem.

owenjm commented 1 year ago

Thanks, and some nice modifications there in your port. I may try to incorporate some of these when I next get the chance ... (setting a random seed is definitely good practice).

The usage case you describe above is a little unusual, but I'll fix it.

(I'm impressed that there's something out there that Python does faster than Perl, also! :)

TeddyHuang-00 commented 1 year ago

@owenjm

Thanks, and some nice modifications there in your port. I may try to incorporate some of these when I next get the chance ... (setting a random seed is definitely good practice).

Sure, no problem. Thanks in advance for your precious time!

The usage case you describe above is a little unusual, but I'll fix it.

This is definitely an edge case, and I've only encountered this in testing only, not in application

I'm impressed that there's something out there that Python does faster than Perl, also! :)

This is achieved by filtering out empty reads (those probes with value equal to zero) early at load_gff instead of later at find_quants and call_peaks_unified_redux, and this significantly increases the speed if there exists a lot of such empty reads. Maybe incorporate this feature along with the fix?