qichao1984 / NCyc

42 stars 22 forks source link

How do you generate the count tables? #32

Open Rob-murphys opened 2 years ago

Rob-murphys commented 2 years ago

I assume you parse the diamond output to somehow generate the count tables? I am wondering how you do this as I would like to know on what contigs the genes were found so I can generate a pseudo abundance of that gene based on mapping depth to that contig.

I wish to do this as a count is only telling me how contigs with that gene on exist but hides the potential biological significant of how abundant that contig may be (i.e. how abundant that bacteria it came from it)

Rob-murphys commented 2 years ago

So I saw in your script you match the gene IDs in id2gene.map.2019Jul to those in the diamond output. Out of interest why id2gene.map.2019Jul instead of id2gene.map which has more entries?

However when i do this method I get a different count than you do so I am missing something but what?

ZengJiaxiong commented 2 years ago

It seems that you want to identify the N cycling genes from ORFs. I suggest you to search ORFs against NCycDB and calculate the abundance using Salmon, CoverM, etc...