tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

Nanocompore fatally hangs #108

Closed DrOllyGomez closed 4 years ago

DrOllyGomez commented 4 years ago

Describe the bug :A clear and concise description of what the bug is. Nanocompore begins to process, loads some data, starts writing some output to the results directory and then hangs. Ie, the process doesn't complete, presumably some threading issue/race condition is occuring. The terminal progress indicator begins. The extent of progress before hanging, as marked by .... 1.the '% done' on the terminal 2.the size of the "out_SampComp.db" file generated in the results directory .... is dependent upon and positively correlated to the number of threads allotted to the application.

46, 15 and 'default' were parameters used for the 'nthreads' parameter, information shown in text files attached to this bug pertain to using the default.

This text file shows the call to nanocompore, and the initial stdout chat .....

callAndInitialStdOut.txt

This text file shows the nanocompore log file created ..... out_SampComp.log

Tracebacks This text file shows the traceback created ..... Traceback1.txt

Versions Dependencies Libraries Ubuntu 18.04 This text file shows the nanocompore version and libraries/dependencies ..... libsdeps.txt

To Reproduce If behaviour is data-dependent, we can arrange supply of actual input.

Expected behavior :A clear and concise description of what you expected to happen. I expected the "out_SampComp.db" file to continue to grow, and for the application to finish, with the progress bars/indicator incrementing.

DrOllyGomez commented 4 years ago

Actually, two Ctrl-C's are needed to quit the stalled app, the second gives the following additional traceback... Traceback2.txt

tleonardi commented 4 years ago

Hi, thanks for reporting the issue! Can you try running it again with --downsample_high_coverage 500? If it still gives an error, could you try it again with --log_level debug and post here the last few lines of output?

thanks! tom

DrOllyGomez commented 4 years ago

Hi, Here are the last few lines of the output with both the downsample and log_level parameters in the call....


Skipping 2415 positions because not present in all samples with sufficient coverage Adding ENST00000375980.8|ENSG00000142634.12|OTTHUMG00000002254.1|OTTHUMT00000006433.1|EFHD2-201|EFHD2|2419|protein_coding| to out_q Worker thread processing new item from in_q: ENST00000375499.7|ENSG00000117118.9|OTTHUMG00000002289.3|OTTHUMT00000006603.1|SDHB-201|SDHB|1153|protein_coding| Adding ENST00000366922.2|ENSG00000067704.9|OTTHUMG00000037287.3|OTTHUMT00000090761.3|IARS2-201|IARS2|3560|protein_coding| to in_q Writer thread writing ENST00000375980.8|ENSG00000142634.12|OTTHUMG00000002254.1|OTTHUMT00000006433.1|EFHD2-201|EFHD2|2419|protein_coding| Error in worker. Kill output queue An error occured. Killing all processes

I have no idea if this even remotely helps, but here is a portion of a screenshot from htop, showing processes. Does 'Z' stand for 'zombie thread'?? 4

DrOllyGomez commented 4 years ago

Um, the latest is... 1) I inserted some debug markers to see where errors were occurring. Essentially, for a particular transcript (viz SDHB-201), the "gmm_test()", calling "gmm_anova_test()" was raising the NanocomporeError("While doing the Anova test we found a sample with within variance = 0. Use --allow_warnings to ignore.") 2) So, I've added the --allow_warnings, to ignore, and rerun. As I type, Nanocompore seems to be motoring through the work, is at 20% progress so far, and is due to finish, according to the progress bar, in about 30mins or so.

I will report back success or failure.

I guess the bug sort of still stands, because, whether the user omits the --allow_warnings parameter intentionally or not, I imagine 'hanging + 2 Ctrl C's needed' is not intended behaviour.

DrOllyGomez commented 4 years ago

Run completed, DB and tsvs written out! :)

tleonardi commented 4 years ago

Hi @DrOllyGomez, thanks for all these details and glad to hear you managed to complete the run. The error for the 0 within viariance is expected behaviour but as you pointed out it should have been clearer from the output. The actual problem that you ran into is related to issue #12 and #104 : essentially it seems threads are (sometimes) not terminated correctly. We are trying to debug the issue.