linxingchen / cobra

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.
MIT License
56 stars 8 forks source link

The role of cobra in classic virome pipeline #27

Open xjhzjucas opened 6 months ago

xjhzjucas commented 6 months ago

Hi Linxing: Thank you for developing this nice tool! I am curious about what's the functions does COBRA have in the classic virome pipeline as it's a new software. For example, I used MEGAHIT to get contigs from metagenome reads, and then I used geNomad to identified the viral contigs from the total contigs, and then if I use COBRA in the follow step (i.e. put the geNomad results:<prefix>_summary/<prefix>_virus.fna as the input of COBRA) , does COBRA help me to bin the identified viral contigs together to get a higher completeness here? Can it work before or after geNomad well? I read about that COBRA can identify more circular viral genome and huge phage. Can I consider COBRA as a binner tool or a circular/huge phages identifier? Thanks!

linxingchen commented 6 months ago

Hi Linxing: Thank you for developing this nice tool! I am curious about what's the functions does COBRA have in the classic virome pipeline as it's a new software. For example, I used MEGAHIT to get contigs from metagenome reads, and then I used geNomad to identified the viral contigs from the total contigs, and then if I use COBRA in the follow step (i.e. put the geNomad results:<prefix>_summary/<prefix>_virus.fna as the input of COBRA) , does COBRA help me to bin the identified viral contigs together to get a higher completeness here? Can it work before or after geNomad well? I read about that COBRA can identify more circular viral genome and huge phage. Can I consider COBRA as a binner tool or a circular/huge phages identifier? Thanks!

Hi, thank you for your interest in COBRA. COBRA will not bin any contigs/scaffolds, however it joins contigs/scaffolds together to get longer sequences (thus higher completeness). You could use the predicted viral contigs/scaffolds (for example, from genomad) as the queries for COBRA to work on, but keep in mind that these queries must be from the same sample, and the -f/-fasta input of COBRA must be all the contigs/scaffolds from the corresponding assembly (no length filtering). Please let me know if you have other concerns.

Cheers, LINXING

xjhzjucas commented 6 months ago

Thank you for your help. According to your suggestions and based on my understandings, COBRA can handle the geNomad's result viral contigs within each single sample to joins contigs together to get higher completeness so that more huge/circular viral contigs will be showed, is that right? And I am a little confused about " all the contigs/scaffolds from the corresponding assembly (no length filtering)", does this means I should all of the geNomad virus.fna in a sample without length filtering?Thanks!

linxingchen commented 6 months ago

You misunderstood. -q/--query is for queries, -f/--fasta is for all the contigs/scaffolds. The queries are those contigs/scaffolds you want COBRA to join. You should not filter length for -f/--fasta. Check "Input files" here for details.

yan1365 commented 3 months ago

Hi Linxing,

Thank you for your excellent tool. I have a question: do the query sequences need to have the same name as the original contig in the contig file? I am asking because the viral identification tool usually renames the contigs, and when I ran Cobra, it indicated that all my query contigs are not in the whole contig file. For example, it says "Query k141_148452||full is not in your whole contig fasta file, please check!". However, I checked and found that the contig "k141_148452" is indeed in my whole contig file.

If I need to rename the viral contigs to their original names, what should I do if two or more viral sequences were identified from the same contig? Or should we use the original contig from which the sequences were identified as the input?

It would be nice if you could provide more detailed examples of how Cobra is used for the virome analysis.

Thanks, Ming