wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
291 stars 29 forks source link

coverm filter #40

Closed 473021677 closed 3 years ago

473021677 commented 3 years ago

Hi , I am using "coverm filter" to remove alignments with insufficient identity. The size for the input bam file is 143374586594 bytes, but the size for the output file is only 4129795 bytes. And I have used "coverm contig --methods trimmed_mean" to calculate the mean coverage for each contig based on the small-sized output file. I am not sure if there is something wrong with it. Could you give some suggestions? Thanks

Best regards

wwood commented 3 years ago

Sorry I'm not sure of your question, specifically.

If you only aim to get coverage with some alignment thresholding, why can you not just specify them when you run convert contig?

Thanks.

473021677 commented 3 years ago

Sorry, I haven't pasted the complete commands. The command for coverm filter was "coverm filter -b LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.bam -o LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.filtered.bam --min-read-percent-identity 0.95 --min-read-aligned-percent 0.9 -t 20". The command for coverm contig --methods trimmed_mean was "coverm contig --methods trimmed_mean --bam-files LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.filtered.bam -t 20". What I mean was that there shouldn't be such a big change for the size of the bam if I used the 95% nucleic acid identity and 90% alignment fraction to remove alignments with insufficient identity. Thanks.

wwood commented 3 years ago

Were there many unmapped reads in the original file? Do you have a specific read mapping that you think should be in the final bam but isn't?


From: 473021677 notifications@github.com Sent: Saturday, October 24, 2020 5:32:32 PM To: wwood/CoverM CoverM@noreply.github.com Cc: Ben J Woodcroft donttrustben@gmail.com; Comment comment@noreply.github.com Subject: Re: [wwood/CoverM] coverm filter (#40)

Sorry, I haven't pasted the complete commands. The command for coverm filter was "coverm filter -b LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.bam -o LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.filtered.bam --min-read-percent-identity 0.95 --min-read-aligned-percent 0.9 -t 20". The command for coverm contig --methods trimmed_mean was "coverm contig --methods trimmed_mean --bam-files LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.filtered.bam -t 20". What I mean was that there shouldn't be such a big change for the size of the bam if I used the 95% nucleic acid identity and 90% alignment fraction to remove alignments with insufficient identity. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/40#issuecomment-715862693, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5ERAP4PV33ZPADBKP3SMJ7JBANCNFSM4S5MCIXQ.

473021677 commented 3 years ago

I have used the bowtie2 to map the metagenomic reads to the 273 prokaryotic genomes with default parameters to generate the sam file. Then I used the commands "samtools view -bS LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sam > LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.bam" and "samtools sort LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.bam -o LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.bam" to generated the sorted bam file. I am not sure if there were many unmapped reads in the original file. But when I used the filtered bam file and unfiltered bam file to calculate the mean coverage per contig through "coverm contig --methods trimmed_mean", the results were almost the same and the calculated mean coverage per contig for the filtered bam file was slighhtly less than that for the unfiltered file. Thanks

wwood commented 3 years ago

Sorry I'm still confused still where you think the bug is in CoverM - it seems likely there was unmapped reads in the unfiltered bam file which is causing the size discrepancy. You can check with samtools flagstat for instance.

473021677 commented 3 years ago

I think there was unmapped reads in the unfiltered bam file.  Thanks for your help.  ---- 原始邮件 ---- From:"Ben J Woodcroft"<notifications@github.com>; Date:2020年10月26日(星期一) 凌晨5:31 To:"wwood/CoverM"<CoverM@noreply.github.com>; Cc:"473021677"<yuany48@mail2.sysu.edu.cn>;"Author"<author@noreply.github.com>; Subject:Re: [wwood/CoverM] coverm filter (#40)

Sorry I'm still confused still where you think the bug is in CoverM - it seems likely there was unmapped reads in the unfiltered bam file which is causing the size discrepancy. You can check with samtools flagstat for instance.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.