jessieren / DeepVirFinder

Identifying viruses from metagenomic data by deep learning
Other
114 stars 31 forks source link

Trained models for RNA viruses #17

Open mlhoggard opened 3 years ago

mlhoggard commented 3 years ago

Hi there,

Thanks for your work with DeepVirFinder. I've been trialing it with some environmental metagenome data, but was also interested in whether it would be appropriate for use with identifying RNA viruses in metatranscriptome data?

From what I can tell, the provided trained models have been developed including prokaryotes in the host database and only DNA viruses in the virus database. Have you experimented with training equivalent models for RNA viruses at all? And/or do you know of any obvious reason why this might be problematic? (Presumably, if the database of known RNA viruses is much smaller than that of DNA viruses, this might not be robust enough to infer across a broad range of putative RNA viruses? And/or are RNA viruses generally considered to contain enough conserved features that the DeepVirFinder approach wouldn't improve on the other available tools?).

I was also curious why Eukaryotes appear to have been omitted from the host database. The VirFinder GitHub page mentions an updated trained model including eukaryote data, but it looks like this was paired back again to just prokaryotes for the development of DeepVirFinder? I currently include a subsequent step to filter out contigs identified as eukaryote-derived (as suggested in the VirFinder docs), but having eukaryotes specifically accounted for within the model would be great.

Thanks in advance for any info, it's much appreciated.

Kind regards, Mike.

YiJessePi commented 3 years ago

Hey, Did you managed to understand deepVF performance on viral RNA data?

mlhoggard commented 2 years ago

Hi @YiJessePi ,

Apologies for the slow reply, but no, in the end I simply decided to omit DeepVirFinder from my RNA virus workflow.

And in the case of RNA viruses, at this stage I'm not sure how much added benefit this approach would bring above other available tools like VIBRANT, VirSorter2, and target gene-based (e.g. RdRp) searches anyway, so I'm sticking with them for the time being.

I also gather the DVF maintainer might have moved on to another job now (judging by their profile page), so I suspect this tool might not get much support or development going forward?

Regards, Mike.