kblin / ncbi-acc-download

Download files from NCBI Entrez by accession
Apache License 2.0
111 stars 8 forks source link

Automated Retrieval of Neighboring Gene Sequences from NCBI Protein Accessions #28

Closed chingciripit closed 3 months ago

chingciripit commented 3 months ago

Hi Kai,

I hope you're well.

I'm looking for a way to retrieve the sequences of neighboring genes based on a set of NCBI protein accession numbers. For instance, I have the following protein accessions:

MEM9139364.1 WP_285051739.1 AAF15369.1 WP_285065334.1 WP_105536047.1

Is there an automated method to obtain the sequences of the genes neighboring these proteins?

Thank you for your help!

Best regards, Ching

kblin commented 3 months ago

Some of the protein IDs you posted are from the non-redundant protein set and might map to one or more actual proteins, see https://www.ncbi.nlm.nih.gov/ipg/WP_285051739.1for an example of the first one. This means that WP_285051739.1 doesn't map directly to a single genomic locus, and thus it's a bit unclear what a "neighbouring gene" would be.

How to handle these cases correctly depends on your scientific question, so there's no automated way to do this, I'm afraid.