Closed kshefchek closed 9 years ago
Worth noting, the RefSeq URI in the Dipper yaml file will only work for nucleotide sequences, for example: www.ncbi.nlm.nih.gov/refseq/?term=NP_005148.2
I'll see if I can find a URI that works for both. Alternatively we could link these to NCBI protein, cc @nlwashington @bryanlaraway
Maybe is correct to link these to NCBI protein regardless. When searching for proteins in refseq you are forwarded to the protein database, see http://www.ncbi.nlm.nih.gov/protein?term=srcdb_refseq[prop]
@nlwashington @bryanlaraway I can't seem to get UniProtKB URIs to resolve using the UniProtKB mapping in our yaml configuration, for example:
http://identifiers.org/UniProt:P43489
Alternatively, this works: http://identifiers.org/uniprot/P43489
Can this be switched or will this break other resources?
please update to the correct curie mapping
We need a protein accession to link amino acid coordinates to their reference sequence. Will use the CCDS transcript ID and this mapping file from their FTP:
ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/CCDS2UniProtKB.current.txt
Could make CCDS its own source class or just add a function to convert this to a python dictionary.