Closed luzhang321 closed 3 years ago
Hi @luzhang321, all gene_families
data are in CPM units, see here. As for regrouping other functional categories, you would have to experiment on your own. The curatedMetagenomicData R/Bioconductor package provides data that is highly processed in a specific way, and seeks to be ultra-consistent so users don't have to worry about the minutiae of data processing. You are always free to experiment with the Nextflow pipeline or the R pipeline on your own.
Hi:) I have a question related to the gene_families file recorded. For example, "2021-03-31.AsnicarF_2017.gene_families", is it the file with each gene family in the community in reads per kilobase (RPK) units or the file with "relative abundance or "copies per million" (CPM)" [ the one got from the command humann_renorm_table --input demo_fastq/demo_genefamilies.tsv --output demo_fastq/demo_genefamilies-cpm.tsv --units cpm --update-snames] ? I also am wondering if it is possible that I use the genefamilies file from cMD to regroup to other functional categories(eg, ecs). [by using humann_regroup_table function?]
And is there a specific reason that you use relative abundance rather than CPM in your result?
Thanks in advance!