linxingchen / cobra

A tool to raise the quality of viral genomes assembled from short-read metagenomes via resolving and joining of contigs fragmented during de novo assembly.
MIT License
62 stars 10 forks source link

BLAST error #29

Open 1023011930 opened 8 months ago

1023011930 commented 8 months ago

Thank you very much for your software output, which provides a new workflow for viral group research!

Building a new DB, current time: 03/09/2024 10:19:13
New DB name:   /home/zhongpei/hard_disk_sda2/zhongpei/Virome/rawdata/upload_20230812/zhongpei_analyse/fastp/final_assembly/clean_TPM_Decontam_contig/Vir_result/AF11_metaSPAdes_COBRA/blastdb_1.fa
New DB title:  blastdb_1.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
BLAST options error: File blastdb_1.fa is empty
BLAST Database error: No alias or index file found for nucleotide database [blastdb_1.fa] in search path [/home/zhongpei/hard_disk_sda2/zhongpei/Virome/rawdata/upload_20230812/zhongpei_analyse/fastp/final_assembly/clean_TPM_Decontam_contig/Vir_result/AF11_metaSPAdes_COBRA::]

When I run COBRA, echo gives the following error But it seems my process is complete. Is this reasonable? And is it a "BLAST error" because my -q sequence is not extended at all? here is my log file log.txt It would be extremely helpful to hear from you!

linxingchen commented 8 months ago

Hi,

Thank you for your interest in COBRA.

You are right that, as you only had 9 queries and none of them was extended, thus the blast db is empty. The newest version of COBRA will let you know this. Did you use the version of 1.2.3?

Please let me know if you have any other question. Always happy to discuss more.

Best, LINXING

1023011930 commented 8 months ago

你好,

感谢您对 COBRA 的兴趣。

你是对的,因为你只有 9 个查询,并且没有一个查询被扩展,因此blast 数据库是空的。最新版本的 COBRA 会让您知道这一点。你用的是1.2.3版本吗?

如果您还有其他问题,请告诉我。总是很乐意讨论更多。

最好的, 林星 Hi, I've been following since the manuscript version, using the original version, and I'm glad to see this article in NM! I've run 2-3 samples so far, about 5-10 quaries each, all without extensions. Should I consider binning

1023011930 commented 8 months ago

Hi, I've been following since the manuscript version, using the original version, and I'm glad to see this article in NM! I've run 2-3 samples so far, about 5-10 quaries each, all without extensions. Should I consider binning? Shouldn't we use cat to put all the viral contigs identified in the samples together and then use the binning tool to bin them? It would be extremely helpful to hear from you!

linxingchen commented 8 months ago

You could bin the (viral) contigs from a single assembly (not the assemblies from two or more samples), this is also true for COBRA, which can not deal with contigs from different assemblies. To be honest, binning is not good for viruses, as you can see the comparison results between COBRA and binning tools in the paper.

Why only 5-10 queries for each sample? You should have more contigs if you are focusing on viruses.

1023011930 commented 8 months ago

You could bin the (viral) contigs from a single assembly (not the assemblies from two or more samples), this is also true for COBRA, which can not deal with contigs from different assemblies. To be honest, binning is not good for viruses, as you can see the comparison results between COBRA and binning tools in the paper.

Why only 5-10 queries for each sample? You should have more contigs if you are focusing on viruses.

this 2-3 samples is low biomass samples,so i think 5~10 contig is reasonable. By the way, have you test VAMB and PHAMB to bin virus? thanks for your kindness!

linxingchen commented 8 months ago

unfortunately I did not test them, but I do not think they will perform very well, given the genomic features of viruses.

1023011930 commented 8 months ago

unfortunately I did not test them, but I do not think they will perform very well, given the genomic features of viruses.

thank you for your helpful reply!i will test cobra with other samples,and i will try phamb,if intrusting result output,i will show in this issue!

linxingchen commented 8 months ago

Good to know. thank you.

1023011930 commented 8 months ago

Unfortunately, after updating to the latest version, the -q sequence is too small is reported as an error no query was extended, exit! this is normal if you only provide few queries.

1023011930 commented 8 months ago

When the number of -q is sufficient, some sequences can be reported to be lengthened or even looped

linxingchen commented 8 months ago

Unfortunately, after updating to the latest version, the -q sequence is too small is reported as an error no query was extended, exit! this is normal if you only provide few queries.

that's expected.

linxingchen commented 8 months ago

When the number of -q is sufficient, some sequences can be reported to be lengthened or even looped

that's right

1023011930 commented 8 months ago

当-q的数量足够时,可能会报告某些序列被拉长甚至循环

这是正确的

Thank you for your response. Next, I will use the sequences from COBRA as the input for checkv, and select contigs of Medium-quality or higher as the final Vir_contig. I would like to ask, what should be done with the low-quality contigs? Should they be processed in some way, or should they be directly discarded?

linxingchen commented 8 months ago

It is totally up to you. Some people reported low-quality genomes in their papers, you could do that as well.

1023011930 commented 8 months ago

Thank you for your response, to avoid false positives we chose to discard Low-quality contigs

linxingchen commented 8 months ago

sounds good.

biohater commented 2 months ago

maybe this error happen with that the coverage.txt is empty。I meet the same error

linxingchen commented 2 months ago

maybe this error happen with that the coverage.txt is empty。I meet the same error

you will meet other type of error(s) if the coverage.txt file is empty i believe.