jessieren / DeepVirFinder

Identifying viruses from metagenomic data by deep learning
Other
116 stars 32 forks source link

Results processing and extraction of region? #33

Open bananabenana opened 2 years ago

bananabenana commented 2 years ago

Hi, you have made a great tool here.

I have used this tool on a bunch of single-isolate assemblies to try and ID phage regions and have two queries regarding processing these outputs:

  1. Is there any way to extract the fasta coordinates of the phage region which have k-mer matches? Or do I just get a score for the entire contig?

  2. What threshold would you confidently consider phage? Above say, 0.8ish?

Example output

name len score pvalue
contig_19:67348-75230 7882 0.999377728 0.003021376
contig_48:1-8398 8397 0.746047735 0.029401768
contig_6:76096-82039 5943 0.733631253 0.030289297
contig_4:106376-112319 5943 0.733631253 0.030289297
contig_2:364020-370963 6943 0.689914644 0.034009366
contig_30:1-6033 6032 0.682190657 0.034689176
contig_7:300801-306685 5884 0.566177189 0.046264824
contig_15:59051-67163 8112 0.552826881 0.048020999
contig_21:82770-90882 8112 0.552826881 0.048020999
contig_30:3621-11733 8112 0.552826881 0.048020999
contig_43:1-5567 5566 0.450131565 0.083427751

Thanks