wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
303 stars 31 forks source link

relative_abundance #12

Closed ucassee closed 4 years ago

ucassee commented 5 years ago

Deer developer, I still don't understand the relative abundance calculation in CoverM. Whether the result of CoverM is the percentage of mapped reads of each genome ? I just want to get the number of reads recruited per kilobase genome per gigabase metagenome (RPKG). Can I use CoverM to calculate ? The following is my command: coverm genome -d /backup/zhouyl/Genome -x fasta -b TS01sorted -t 20 --min-read-aligned-length 50 --min-read-percent-identity 0.99 >TS01sorted .coverage The following is the result:

Genome TS01sorted Relative Abundance (%) unmapped 75.77681 B100T1B8 0 B100T1L10 0.45533308 B100T3L11 0.024251277 B101T1B8 0.03663064 B101T1L10 0.047809314 B101T3L11 0.014094652 B102T1B8 0 B102T1L10 0.07564686 B102T3L11 0.03380026 B103T1B8 0 B103T1L10 0.078486376 B104T1B8 0 B104T3L11 0.0088422205

Hope to your reply Thanks in advance~

wwood commented 5 years ago

Hi,

The relative_abundance method is documented in the README - I hope that clears things up? It is based on mean coverage, not rpkg (or rpkm). RPKM isn't calculated by the current release of CoverM (0.2.0-alpha7) but I just pushed a commit to the master branch that calculates rpkm directly, using -m rpkm - it will be in the next version or you can compile it from source now. See also #11.

ucassee commented 5 years ago

@wwood Thanks for your reply. After see README, I still have two questions? Why it need to minus 75 bp2 in the end of the contig? Is the relative_abundance of genome-A calculated by ` (genome-A mean coverage/all genomes mean coverage)(genome-A mapped reads/all metagenome reads)`

I want to calculate the abundance of 500 metagenome-assembled genomes, and the ANI analysis show some of the genomes are vary similar(97% identity). So I not sure whether the alignment of bwa can distinguish the best hit. I see --sharded can choose the best hit. What reads alignment software do you use?

Hope you add rpkg calculation in next version! hah~

wwood commented 5 years ago

Hi,

1) The minus 75bp is to account for the difficulty of mapping to ends of contigs - this follows from the metabat coverage calculation tool.

2) No its (genome-A mean coverage/sum(all genomes mean coverage))*(all mapped reads/all metagenome reads)

3) --sharded isn't for closely related genomes, its for when you have a very large database e.g. all Bacteria and Archaea so the BWA index doesn't fit in RAM.

ucassee commented 5 years ago

Thanks for your patience. In your instruction, you mention --sharded can choose the best hit for each read pair.

Sharding i.e. multiple reference sets (optional): --sharded (experimental) If -b/--bam-files was used: Input BAM files are read-sorted alignments of a set of reads mapped to multiple reference contig sets. Choose the best hit for each read pair.

                                     Otherwise if mapping was carried out:
                                       Map reads to each reference, choosing the
                                       best hit for each pair.

If some genomes are very similar, will it occur mismatch ? Have your ever test this?

wwood commented 5 years ago

The sharded option does not help with differentiating closely related strains - it is for reducing RAM requirements of mapping when databases are large.

If genomes are similar, unfortunately reads may be incorrectly mapped. This is really a limitation of the entire mapping-based approach. I happen to be working on a solution for this, but it isn't ready for public consumption and will be a separate piece of software (called cockatoo).

ucassee commented 5 years ago

Thanks! Hope you finish the cockatoo development soon~

ucassee commented 5 years ago

I want to try -m rpkm . So I use git clone to download coverm source code and use rust to compile. But it went error. Could you help me?

cargo install coverm --root /data2017/zhouyl/software/coverM-TEST --path /data2017/zhouyl/software/CoverM Installing coverm v0.2.0-alpha7 (/data2017/zhouyl/software/CoverM) Updating crates.io index Compiling rust-htslib v0.26.0 error: failed to compile coverm v0.2.0-alpha7 (/data2017/zhouyl/software/CoverM), intermediate artifacts can be found at /data2017/zhouyl/software/CoverM/target

Caused by: failed to run custom build command for rust-htslib v0.26.0

Caused by: process didn't exit successfully: /data2017/zhouyl/software/CoverM/target/release/build/rust-htslib-c675301b4fd304f2/build-script-build (exit code: 101)

wwood commented 5 years ago

Hi, you can just use cargo run --release -- genome ..... rather than coverm install if you want, but that rust-htslib error you'll need to talk with the rust-htslib people about that. It would be helpful to look at the output of cargo build as that will give a fuller description of the error.

ucassee commented 5 years ago

Thanks for your reply~ But running cargo run --release -- genome is also with rust-htslib error. Maybe I should install rust-htslib

wwood commented 5 years ago

Hi,

Since you haven't given the error I'm not sure how to help. I don't imagine reinstalling rust will work, but maybe.

I envisage releasing 0.3.0 this week though if you wanted to wait.


From: Yingli Zhou notifications@github.com Sent: Tuesday, October 8, 2019 3:04:16 PM To: wwood/CoverM CoverM@noreply.github.com Cc: Ben J Woodcroft donttrustben@gmail.com; Mention mention@noreply.github.com Subject: Re: [wwood/CoverM] relative_abundance (#12)

Thanks for your reply~ But running cargo run --release -- genome is also with rust-htslib error. Maybe I should reinstall rust.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/12?email_source=notifications&email_token=AAADX5HP7FS2ZV3LYIKRI4TQNQIFBA5CNFSM4I3MNXAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASXJOY#issuecomment-539325627, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAADX5CYI7KLIQIAQGFZVRTQNQIFBANCNFSM4I3MNXAA.

ucassee commented 5 years ago

I don't have root privilege, so it is difficult for me to debug rust-htslib error. I will wait releasing 0.3.0 (●'◡'●)

wwood commented 5 years ago

Alrighty, well, 0.3.0 is now out. Good luck.

ucassee commented 4 years ago

Hi @wwood , I used genome -m relative_abundance and contig -m rpkm parameter to calculate MAG and gene abundance. I am preparing my manuscript and writing methods. Could you please help me to write a sentence for description of these calculation?

Was the rpkm of gene-A calculated by (gene-A mapped reads 10^9/(all metagenome reads 1000bp))

Thanks in advance .

wwood commented 4 years ago

Sorry, I will not be able to do that, as I am on leave.


From: Yingli Zhou notifications@github.com Sent: Tuesday, May 26, 2020 11:50:33 AM To: wwood/CoverM CoverM@noreply.github.com Cc: Ben J Woodcroft donttrustben@gmail.com; Mention mention@noreply.github.com Subject: Re: [wwood/CoverM] relative_abundance (#12)

Reopened #12https://github.com/wwood/CoverM/issues/12.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/12#event-3371602058, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5A4KMEFHO5DZEPENULRTMN6TANCNFSM4I3MNXAA.

ucassee commented 4 years ago

Hi, I will try to write by myself first. But I don't know if I understand these methods correctly. Hope to get your help, when you back to work. Have a nice vacation!