Open Rob-murphys opened 2 years ago
So I saw in your script you match the gene IDs in id2gene.map.2019Jul
to those in the diamond output. Out of interest why id2gene.map.2019Jul
instead of id2gene.map
which has more entries?
However when i do this method I get a different count than you do so I am missing something but what?
It seems that you want to identify the N cycling genes from ORFs. I suggest you to search ORFs against NCycDB and calculate the abundance using Salmon, CoverM, etc...
I assume you parse the diamond output to somehow generate the count tables? I am wondering how you do this as I would like to know on what contigs the genes were found so I can generate a pseudo abundance of that gene based on mapping depth to that contig.
I wish to do this as a count is only telling me how contigs with that gene on exist but hides the potential biological significant of how abundant that contig may be (i.e. how abundant that bacteria it came from it)