Closed Thexiyang closed 5 years ago
Hi, possible that trimmed mean > mean, because both the most covered and least covered positions are removed before recalculating the mean from the middle.
That does seem like a big difference though. What do you see when you run it with -m coverage_histogram
?
Here it is. Any idea here after you look at it? Thanks! Bin1_coverage_histogram.txt
Hi again, Thanks for that - calculating the mean and trimmed mean manually in R gives me a different answer, suggesting there might be a bug. Are you able to send the BAM file that you used here please?
Also, how exactly did you run CoverM i.e. what command line arguments? To make reproducing easier.
Thanks.
The file is too large to upload. I plan to send you an email with google-drive link to this email address: b.woodcroft near uq.edu.au. Is the address right? Thanks for checking!
Yes, that address will work. Thanks.
Hi, this turned out to be an unsigned integer overflow bug - I'll push a fix in a second to the master branch. For reference, the output of your command for me is now
$ cargo run --release -- genome --bam-files combined.S3_1stq_filtered.bam --genome-fasta-directory ./ -x fa -t 12 -m mean relative_abundance trimmed_mean > S3_coverm
Compiling coverm v0.3.0 (/home/ben/git/coverm-ben2)
Finished release [optimized] target(s) in 12.04s
Running `/home/ben/git/coverm-ben2/target/release/coverm genome --bam-files combined.S3_1stq_filtered.bam --genome-fasta-directory ./ -x fa -t 12 -m mean relative_abundance trimmed_mean`
[2019-11-14T23:28:56Z INFO coverm] CoverM version 0.3.0
[2019-11-14T23:28:56Z INFO coverm] Using min-covered-fraction 10%
[2019-11-14T23:28:56Z INFO coverm] Not using directory entry './small.bam' as a genome FASTA file, as it does not end with the extension 'fa'
[2019-11-14T23:28:56Z INFO coverm] Not using directory entry './combined.S3_1stq_filtered.bam.bai' as a genome FASTA file, as it does not end with the extension 'fa'
[2019-11-14T23:28:56Z INFO coverm] Not using directory entry './combined.S3_1stq_filtered.bam' as a genome FASTA file, as it does not end with the extension 'fa'
[2019-11-14T23:28:56Z INFO coverm] Not using directory entry './S3_coverm' as a genome FASTA file
[2019-11-14T23:28:56Z INFO coverm] Not using directory entry './old.bai' as a genome FASTA file, as it does not end with the extension 'fa'
[2019-11-14T23:28:56Z INFO coverm] Reading contig names for 1 genomes ..
[2019-11-14T23:28:56Z INFO coverm::genome] Of 151703 reference IDs, 117 were assigned to a genome and 151586 were not
[2019-11-14T23:30:32Z INFO coverm::genome] In sample 'combined.S3_1stq_filtered', found 31465837 reads mapped out of 79089524 total (39.79%)
Genome combined.S3_1stq_filtered Mean combined.S3_1stq_filtered Relative Abundance (%) combined.S3_1stq_filtered Trimmed Mean
unmapped NA 60.214912 NA
S3_bin4 2026.2925 39.78509 2030.0072
Thanks for the clear bug report.
Thanks a lot! Would it possible to also update the bioconda version to 0.3.1?
Thanks for this very nice tool. I have a question regarding the output. In my results, there is one bin showing quite different values between the Mean and Trimmed Mean (see below). In my understanding, Tirmmed Mean values should be less than Mean values. Most other bin follow this rule with only a few exceptions. Is it right? Of these two parameters, which one more accurately represenst the relative abundance? Thanks!
Genome | Mean | Relative Abundance (%) | Trimmed Mean Bin1 | 106.6692 | 2.686203 | 2030.007