Closed ucassee closed 4 years ago
Hi,
The relative_abundance method is documented in the README - I hope that clears things up? It is based on mean coverage, not rpkg (or rpkm). RPKM isn't calculated by the current release of CoverM (0.2.0-alpha7) but I just pushed a commit to the master branch that calculates rpkm directly, using -m rpkm
- it will be in the next version or you can compile it from source now. See also #11.
@wwood Thanks for your reply. After see README, I still have two questions? Why it need to minus 75 bp2 in the end of the contig? Is the relative_abundance of genome-A calculated by ` (genome-A mean coverage/all genomes mean coverage)(genome-A mapped reads/all metagenome reads)`
I want to calculate the abundance of 500 metagenome-assembled genomes, and the ANI analysis show some of the genomes are vary similar(97% identity). So I not sure whether the alignment of bwa can distinguish the best hit. I see --sharded
can choose the best hit. What reads alignment software do you use?
Hope you add rpkg calculation in next version! hah~
Hi,
1) The minus 75bp is to account for the difficulty of mapping to ends of contigs - this follows from the metabat coverage calculation tool.
2) No its (genome-A mean coverage/sum(all genomes mean coverage))*(all mapped reads/all metagenome reads)
3) --sharded
isn't for closely related genomes, its for when you have a very large database e.g. all Bacteria and Archaea so the BWA index doesn't fit in RAM.
Thanks for your patience. In your instruction, you mention --sharded
can choose the best hit for each read pair.
Sharding i.e. multiple reference sets (optional): --sharded (experimental) If -b/--bam-files was used: Input BAM files are read-sorted alignments of a set of reads mapped to multiple reference contig sets. Choose the best hit for each read pair.
Otherwise if mapping was carried out:
Map reads to each reference, choosing the
best hit for each pair.
If some genomes are very similar, will it occur mismatch ? Have your ever test this?
The sharded option does not help with differentiating closely related strains - it is for reducing RAM requirements of mapping when databases are large.
If genomes are similar, unfortunately reads may be incorrectly mapped. This is really a limitation of the entire mapping-based approach. I happen to be working on a solution for this, but it isn't ready for public consumption and will be a separate piece of software (called cockatoo).
Thanks! Hope you finish the cockatoo
development soon~
I want to try -m rpkm
. So I use git clone
to download coverm source code and use rust to compile. But it went error. Could you help me?
cargo install coverm --root /data2017/zhouyl/software/coverM-TEST --path /data2017/zhouyl/software/CoverM Installing coverm v0.2.0-alpha7 (/data2017/zhouyl/software/CoverM) Updating crates.io index Compiling rust-htslib v0.26.0 error: failed to compile
coverm v0.2.0-alpha7 (/data2017/zhouyl/software/CoverM)
, intermediate artifacts can be found at/data2017/zhouyl/software/CoverM/target
Caused by:
failed to run custom build command for rust-htslib v0.26.0
Caused by:
process didn't exit successfully: /data2017/zhouyl/software/CoverM/target/release/build/rust-htslib-c675301b4fd304f2/build-script-build
(exit code: 101)
Hi, you can just use cargo run --release -- genome .....
rather than coverm install if you want, but that rust-htslib error you'll need to talk with the rust-htslib people about that. It would be helpful to look at the output of cargo build
as that will give a fuller description of the error.
Thanks for your reply~ But running cargo run --release -- genome
is also with rust-htslib error
. Maybe I should install rust-htslib
Hi,
Since you haven't given the error I'm not sure how to help. I don't imagine reinstalling rust will work, but maybe.
I envisage releasing 0.3.0 this week though if you wanted to wait.
From: Yingli Zhou notifications@github.com Sent: Tuesday, October 8, 2019 3:04:16 PM To: wwood/CoverM CoverM@noreply.github.com Cc: Ben J Woodcroft donttrustben@gmail.com; Mention mention@noreply.github.com Subject: Re: [wwood/CoverM] relative_abundance (#12)
Thanks for your reply~ But running cargo run --release -- genome is also with rust-htslib error. Maybe I should reinstall rust.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/12?email_source=notifications&email_token=AAADX5HP7FS2ZV3LYIKRI4TQNQIFBA5CNFSM4I3MNXAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASXJOY#issuecomment-539325627, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAADX5CYI7KLIQIAQGFZVRTQNQIFBANCNFSM4I3MNXAA.
I don't have root privilege, so it is difficult for me to debug rust-htslib error
. I will wait releasing 0.3.0 (●'◡'●)
Alrighty, well, 0.3.0 is now out. Good luck.
Hi @wwood ,
I used genome -m relative_abundance
and contig -m rpkm
parameter to calculate MAG and gene abundance. I am preparing my manuscript and writing methods. Could you please help me to write a sentence for description of these calculation?
Was the rpkm of gene-A calculated by (gene-A mapped reads 10^9/(all metagenome reads 1000bp))
Thanks in advance .
Sorry, I will not be able to do that, as I am on leave.
From: Yingli Zhou notifications@github.com Sent: Tuesday, May 26, 2020 11:50:33 AM To: wwood/CoverM CoverM@noreply.github.com Cc: Ben J Woodcroft donttrustben@gmail.com; Mention mention@noreply.github.com Subject: Re: [wwood/CoverM] relative_abundance (#12)
Reopened #12https://github.com/wwood/CoverM/issues/12.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/12#event-3371602058, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5A4KMEFHO5DZEPENULRTMN6TANCNFSM4I3MNXAA.
Hi, I will try to write by myself first. But I don't know if I understand these methods correctly. Hope to get your help, when you back to work. Have a nice vacation!
Deer developer, I still don't understand the relative abundance calculation in CoverM. Whether the result of CoverM is the percentage of mapped reads of each genome ? I just want to get
the number of reads recruited per kilobase genome per gigabase metagenome (RPKG)
. Can I use CoverM to calculate ? The following is my command:coverm genome -d /backup/zhouyl/Genome -x fasta -b TS01sorted -t 20 --min-read-aligned-length 50 --min-read-percent-identity 0.99 >TS01sorted .coverage
The following is the result:Hope to your reply Thanks in advance~