jessieren / DeepVirFinder

Identifying viruses from metagenomic data by deep learning
Other
116 stars 32 forks source link

Prediction refactoring #29

Open jsgounot opened 2 years ago

jsgounot commented 2 years ago

Potential related issues

All issues related to long runtime: #21, #24, #27

Description

After spending a day figuring out why the software takes hours to finish on some sequences, I found several issues:

Pull request description

Complete rewritting of the main dvf.py script. This should produce exactly the same results than before. The results order might change however.

With one core and using the CRC test file, old version took 102 seconds while new one and in 29 seconds, meaning a 3.5 times improvement. I suspect the improvement to be actually much better than this since half of the time (~ 15 seconds) is used here to load models. New version of the software does not include the multiprocessing step.

I implement a random check that new pvalue equals old value every 1,000 predictions but this has never been trigger for all my tests.

I added a compare_results.py script to compare results regardless of the order.

ShailNair commented 1 year ago

@jsgounot Hi, Thanks for the commit. I tried to run the modified dvf.py file. However, I get a segmentation fault error. Here is my code

python /home/soft/DeepVirFinder/dvf_mod.py \ -i 0.1.assembly/0.3.merged.contigs/merged.contigs.fix.fasta \ -o 0.3.metavirome/0.1.identify_vcontig/0.2.deepvir`

pid 148590's current affinity mask: ffffffffffffffffffffffffff pid 148590's new affinity mask: ff Using Theano backend. WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

  1. Loading Models. Model directory /home/soft/DeepVirFinder/models
  2. Encoding and Predicting Sequences. Predict for c_000000000001 [Segmentation fault (core dumped)`

Also ther is no -c option for multithreading

jsgounot commented 1 year ago

HI @ShailNair, it's been a while since I did this and to be honest, I don't remember most of it. Regarding your segmentation fault, I guess this can be linked directly to your numpy and keras installation (my guess would be during the prediction part). One way to quickly check this would be to use multiple print statements. Concerning the multi threading, as I said in my original post, the previous implementation leads to error on my server which made me remove this part. I would be happy to answer some questions but please do not expect me to provide full support, I'm not one of the author of this tool and people behind this project completely ignored all the merge requests before.

ShailNair commented 1 year ago

@jsgounot Thanks for the quick reply.

Regarding your segmentation fault, I guess this can be linked directly to your numpy and keras installation (my guess would be during the prediction part). One way to quickly check this would be to use multiple print statements.

Yes. I will check regarding this.

I'm not one of the author of this tool and people behind this project completely ignored all the merge requests before.

That's unfortunate as multiple users have faced the speed issue with DeepVirFinder. I hope the developers @jessieren soon take an action.