qichao1984 / NCyc

42 stars 22 forks source link

Question about NCyc_95.faa.gz vs NCyc_100.faa.gz #40

Open pthieringer opened 7 months ago

pthieringer commented 7 months ago

Hello and thank you for this very useful tool for finding N-cycling related genes!

I have a question regarding the output of using the NCyc_95 vs NCyc_100(_2019Jul) databases: how is it that the 95 database can produce more hits for a gene than the 100 database? To provide some background, I am running the program on a set of MAGs and wanted to see how the output would differ using the two different databases. I have multiple MAGs where there are a greater number of hits for genes resulting from using the 95 database compared to the 100 database. Why does this happen?

I am curious about this because it would seem that any of the representative sequences belonging to the 95 database should also be present in the 100 database? Am I misunderstanding how these two databases are created?

I am running the tool using Diamond and am happy to share any code or resulting output of the program as you may need! Thank you for your time and advice!