Accession IDs of reference database

rprops commented 5 years ago

Hi, Is it possible to provide the accession IDs of the sequences used in NCyc_100.faa.gz? Thanks, Ruben

qichao1984 commented 5 years ago

NCBI accession numbers are not available since these sequences come from UniProt and multiple orthologous databases (e.g. COG, eggNOG, KEGG, and m5nr etc.), not from NCBI nr/nt databases. However, the sequence IDs from the original databases are retained in the NCyc_100.faa.gz file.

From: rprops notifications@github.com Sent: Tuesday, April 16, 2019 12:04 AM To: qichao1984/NCyc Cc: Subscribed Subject: [qichao1984/NCyc] Accession IDs of reference database (#6)

Hi, Is it possible to provide the accession IDs of the sequences used in NCyc_100.faa.gz? Thanks, Ruben

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/qichao1984/NCyc/issues/6, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFromGZYf5a-Ztau5tNZTfJen1Ir4PMMks5vhKLygaJpZM4cwV2O.

rprops commented 5 years ago

Understood, thanks for the clarification. I was just wondering if there was a way using these identifiers to trace back the taxonomy/organism from which these genes were retrieved. This would allow a more detailed inspection of the homology between query and reference sequences.

qichao1984 / NCyc

Accession IDs of reference database #6