jessieren / DeepVirFinder

Identifying viruses from metagenomic data by deep learning
Other
116 stars 32 forks source link

How can DeepVirFinder ensure the accuracy of long contig predictions? #16

Open ZongzhiWu opened 3 years ago

ZongzhiWu commented 3 years ago

Dear author, I want to use DeepVirFinder on my metagenomics samples. There are many contigs over 3000 bps(some about 10000-50000 bps). And how can DeepVirFinder ensure the accuracy of long contig predictions, in light of DeepVirFinder use 150-3000 bps contigs as training data. How DeepVirFinder reprocess contigs over 3000 bps, just abandon the fraction over 3000 bps, or in other ways? Thanks for answer~ Best wishes~

jessieren commented 3 years ago

Thanks for your interest in using DeepVirFinder.

DeepVirFinder trained a few models for predicting contigs of different lengths. As we stated in the paper, “we use the model trained by 150 bp sequences for predicting any sequences <300 bp. Similarily, we used the model trained by 300 bp sequences for predicting sequences of the length 300– 500 bp, the model trained by 500 bp sequences for predicting 500–1000 bp sequences, and the 1000 bp model to predict sequences >1000 bp.” See Figure 2B.

For contig > 3000 bp, the sequence will be predicted using the 1000 bp model. The whole sequence will be fed into the convolutional and max pooling layers without truncation or abandoning part of the sequence. As shown in Figure 2A, the longer the sequence, the higher the prediction accuracy.

Hope that helps.

444thLiao commented 2 years ago

Dear author, I also have concerns about that issue. What I want to do is predicting the phage/prophage from the assembled genome sequences. Thus generally we need to apply it in some very long contigs which could be over 10Kbp. How does this software perform in this situation which is far more larger than the fig2A shown.

Should I smash it into the shorter genomic sequence for better prediction since it could also directly predict the exact positions on the genome.

Any suggestions for that?

Thanks~