qichao1984 / NCyc

42 stars 22 forks source link

DNA sequence #37

Open TKsh6 opened 1 year ago

TKsh6 commented 1 year ago

Cause I need to use the DNA seq of the N-related proteins, Can you publish the DNA sequence of each protein sequence?

qichao1984 commented 1 year ago

Sorry that we did not collect DNA sequences when building the database. They can be recovered by looking up the ids in multiple databases, which is a big job…

On Jul 3, 2023, at 17:57, TKsh6 @.***> wrote:



Cause I need to use the DNA seq of the N-related proteins, Can you publish the DNA sequence of each protein sequence?

— Reply to this email directly, view it on GitHubhttps://github.com/qichao1984/NCyc/issues/37, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABNORGFKVNNRPWPVUHQ263TXOKJO5ANCNFSM6AAAAAAZ4G45FE. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jianshu93 commented 9 months ago

Hello Both,

I also have the same question because after reads are extracted from metagenomes via NCyc, nt reads can be phylogenetic replaced into the reference gene tree (can be any N gene), so that we are clear on what subtype of genes those reads are placed (e.g., there are 2 types of nosZ, type I nosZ and type II nosZ; 3 types of NarG, OP1 or Gamma or others). We cannot do this on AA because they will lose accuracy for short amino acid sequences translated from reads. Replacement based on nt will be very useful and accurate.

Thanks,

Jianshu