tseemann / shovill

⚡♠️ Assemble bacterial isolate genomes from Illumina paired-end reads
GNU General Public License v3.0
212 stars 45 forks source link

Spades warning messages? #150

Open maesaar opened 3 years ago

maesaar commented 3 years ago

Hello @andersgs I have run spades 3.14.1 included in shovill and I get constantly three types of warnings. Also mentioned in #19 . Are they benign or how should they be adressed?

Thanks

=== Error correction and assembling warnings:

0:00:35.313 92M / 2G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 2 0:00:35.313 92M / 2G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 2 0:00:28.189 124M / 4G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 12 0:00:28.190 124M / 4G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 12 0:00:22.490 115M / 4G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 6 0:00:22.490 115M / 4G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 6 0:00:18.378 138M / 4G WARN General (kmer_coverage_model.cpp : 218) Too many erroneous kmers, the estimates might be unreliable ======= Warnings saved to /home/bioinf/Desktop/CJ_21122020/shovill/CAMP3H_S101/spades/warnings.lo

andersgs commented 3 years ago

@maesaar we have not dug too deeply into that yet. But, it does suggest some issue with the underlying FASTQ data. Is this warning associated with a particular sample? Are you able to share it?

maesaar commented 3 years ago

@andersgs I can share the fastqs after holiday is that ok with you?

maesaar commented 3 years ago

@andersgs can i email the link directly? For now i have chosen the skesa assembler to use with shovill - do you think its good alternative?

Just for background the fastqs are 4x2 files concatenated as said in #144

andersgs commented 3 years ago

did you concatenate in the same order?

L1, L2, L3, L4 > R1 L1, L2, L3, L4 > R2

That may explain the issue you are observing.

And, emailing a like directly is fine.

maesaar commented 3 years ago

The cat commands are as follows:

cat CAMP01-08H_S34_L001_R1_001.fastq.gz CAMP01-08H_S34_L002_R1_001.fastq.gz CAMP01-08H_S34_L003_R1_001.fastq.gz CAMP01-08H_S34_L004_R1_001.fastq.gz > R1.fastq.gz

cat CAMP01-08H_S34_L001_R2_001.fastq.gz CAMP01-08H_S34_L002_R2_001.fastq.gz CAMP01-08H_S34_L003_R2_001.fastq.gz CAMP01-08H_S34_L004_R2_001.fastq.gz > R2.fastq.gz

shovill --R1 R1.fastq.gz --R2 R2.fastq.gz --outdir CAMP01-08H_S34 --keepfiles --minlen 200 --ram 58 --trim

And the warning: [spades] * 0:00:18.097 65M / 3G WARN General (kmer_coverage_model.cpp : 218) Too many erroneous kmers, the estimates might be unreliable

On Wed, 23. Dec 2020 at 22:39, Anders Goncalves da Silva < notifications@github.com> wrote:

did you concatenate in the same order?

L1, L2, L3, L4 > R1 L1, L2, L3, L4 > R2

That may explain the issue you are observing.

And, emailing a like directly is fine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tseemann/shovill/issues/150#issuecomment-750464897, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZEVCFO2BJUO4ZWBKSSN4DSWJIPZANCNFSM4VHGPWSQ .

maesaar commented 3 years ago

@andersgs I was able to share the link to download the reads with the mentioned one warning message via e-mail.

if you need reads which includes different warning messages please let me know.

maesaar commented 3 years ago

@andersgs please look spades issue #630 where the logs are for additional information. Could you check why logs 2) and 3) in section a) give different warnings for spades? The first (log "2)") is concatenated FASTQs of R1 and R2 and then only trimmed in shovill followed separate spades run and the second (log "3)") is 4 pairs only trimmed with shovill separately and then the trimmed reads of R1s and R2s were concatenated and used in spades run.

asl commented 3 years ago

These warnings are usually caused by uneven coverage likely due to read subsampling / read correction. The results might be suboptimal / expect misassemblies.