Estimating the abundances of multiple viral genomes

wwood / CoverM

Read coverage calculator for metagenomics

GNU General Public License v3.0

311 stars 31 forks source link

Estimating the abundances of multiple viral genomes #210

Open asierFernandezP opened 7 months ago

asierFernandezP commented 7 months ago

Hi,

I am currently running coverm genome to estimate the abundance of multiple viral genomes in my samples. However I am not sure which is the best way to do this:

Is it correct to specify with --genome-fasta-files a single FASTA file with all the viral genomes? Should I split this FASTA into files containing only one viral genome per file? (or these 2 options make no difference at all)
Should I use the --reference option instead?

Thank you, Asier

wwood commented 7 months ago

I think probably easiest to use contig mode instead of genome. The only downside is that you cannot output relative abundance. However that is readily calculated from the ratio of the means, perhaps taking into account the number of reads that map.

asierFernandezP commented 7 months ago

Thank you for the quick response!

And regarding the output, as I am currently using both --coupled (with paired FASTQs) and --single (with unpaired FASTQ) options, I get 2 columns of abundances (one for the paired files and one for the unpaired). Which would be the best way to combine this into a single column (as I am just interested in getting the total abundance of each contig in my sample - considering both paired and unpaired reads?

wwood commented 7 months ago

If you are just using the mean output, I think easiest is just to add the results of the two columns. More complicated for other outputs.

asierFernandezP commented 7 months ago

In this case I am using RPKM

SebasSaenz commented 7 months ago

Hi,

Thank you for the amazing tool that has saved a lot of time in my analysis !!!

I was following this question and I don't fully understand this: "However that is readily calculated from the ratio of the means, perhaps taking into account the number of reads that map."

Does this means?

Total mapped reads 10 out of 100 reads

              mean  reads    %

contig_a 2 3.3 3.3 contig_b 4. 6.7 6.7

I am sorry if this is nonsense

Best,

Johan Sebastián