Open SergeyBaikal opened 1 year ago
Dear Sergey,
Thank you for testing the tool and sharing your valuable feedback! I’d like to address your observations and questions:
The tool employs a k-mer matching strategy, meaning that any random overlap of k-mers between the query sequence and the database could lead to a genus assignment, even if the taxonomy (e.g., RNA viruses) is not as expected. To mitigate this, we’ve introduced a new metric called the "Enrichment Score," which helps reduce the likelihood of random k-mer matches affecting the predictions.
Additionally, this model is specifically designed for predicting viral sequences. Applying it to non-viral sequences may result in incorrect taxonomic assignments. To provide further clarity, we’ve included a new section in the README titled "Method Limitations and Interpretation" to elaborate on these points.
In future updates, we will add full taxonomic lineage (e.g., family, order, genus, species) in the output file, and will provide arguments to choose cutoff for both Entropy and Enrichment_Score.
Please let us know if you have further questions!
python3 predict.py --model_path /home/sergey/VirusTaxo/Dataset/vt_db_rna_virus_kmer_17.pkl --seq /home/VirusTaxo/My_Data/contigs.fasta > /home/VirusTaxo/My_Data/Results.txt
Dear authors, could you clarify please what I'm doing wrong?