zwdzwd / transvar

TransVar - multiway annotator for precision medicine
Other
118 stars 33 forks source link

Problems with UniProt support #24

Closed grayfall closed 6 years ago

grayfall commented 6 years ago

I'm having problems converting amino-acid substitutions to genomic coordinates using UniProt identifiers, e.g.

$ transvar panno -i 'Q9BXW4:p.G126A' --uniprot --ccds
input   transcript  gene    strand  coordinates(gDNA/cDNA/protein)  region  info
Q9BXW4:p.G126A  CCDS31074 (protein_coding)  MAP1LC3C    -   chr1:g.242159532C>G/c.377G>C/p.G126A    inside_[cds_in_exon_4]  CSQN=Missense;reference_codon=GGC;candidate_codons=GCA,GCC,GCG,GCT;candidate_mnv_variants=chr1:g.242159531_242159532delGCinsTG,chr1:g.242159531_242159532delGCinsCG,chr1:g.242159531_242159532delGCinsAG;source=CCDS

I'm getting the following output for quite a few proteins:

$ transvar panno -i 'Q99418:p.E156D' --uniprot --ccds
input   transcript  gene    strand  coordinates(gDNA/cDNA/protein)  region  info
[wrap_exception] warning: seek out of range
Q99418:p.E156D  .   .   .   ././.   .   Error_seek out of range
[wrap_exception] warning: seek out of range
Q99418:p.E156D  .   .   .   ././.   .   Error_seek out of range

I've tried different locations and substitutions (e.g. -i 'Q99418:1') to no avail. Here are some more affected proteins: P04180, P45381, P45381, Q14289, P49916. Am I doing something wrong?

zwdzwd commented 6 years ago

Thanks for reporting. I have

$ transvar panno -i 'Q99418:p.E156D' --uniprot --ccds
input   transcript      gene    strand  coordinates(gDNA/cDNA/protein)  region  info
Q99418:p.E156D  CCDS12722 (protein_coding)      CYTH2   +       chr19:g.48977195G>C/c.468G>C/p.E156D    inside_[cds_in_exon_6]  CSQN=Missense;reference_codon=GAG;candidate_codons=GAC,GAT;candidate_snv_variants=chr19:g.48977195G>T;source=CCDS
Q99418:p.E156D  CCDS12722 (protein_coding)      CYTH2   +       chr19:g.48977195G>C/c.468G>C/p.E156D    inside_[cds_in_exon_6]  CSQN=Missense;reference_codon=GAG;candidate_codons=GAC,GAT;candidate_snv_variants=chr19:g.48977195G>T;source=CCDS

Have you tried with

transvar config --download_idmap

to get the Uniprot id maps?

Thanks,

grayfall commented 6 years ago

@zwdzwd thanks for replying so quickly. Yes, I've downloaded the ID mapping, which is why some proteins do work.

(mutagenesis) $ transvar config --download_idmap
[downloading] ~/.cenvs/envs/mutagenesis/lib/python3.6/site-packages/transvar/transvar.download/uniprot.idmapping.txt.gz.idx ..Done (69.7 MB).

(mutagenesis) $ transvar panno -i 'Q99418:p.E156D' --uniprot --ccds
input   transcript  gene    strand  coordinates(gDNA/cDNA/protein)  region  info
[wrap_exception] warning: seek out of range
Q99418:p.E156D  .   .   .   ././.   .   Error_seek out of range
[wrap_exception] warning: seek out of range
Q99418:p.E156D  .   .   .   ././.   .   Error_seek out of range

(mutagenesis) $ transvar panno -i 'Q9BXW4:p.G126A' --uniprot --ccds
input   transcript  gene    strand  coordinates(gDNA/cDNA/protein)  region  info
Q9BXW4:p.G126A  CCDS31074 (protein_coding)  MAP1LC3C    -   chr1:g.242159532C>G/c.377G>C/p.G126A    inside_[cds_in_exon_4]  CSQN=Missense;reference_codon=GGC;candidate_codons=GCA,GCC,GCG,GCT;candidate_mnv_variants=chr1:g.242159531_242159532delGCinsTG,chr1:g.242159531_242159532delGCinsCG,chr1:g.242159531_242159532delGCinsAG;source=CCDS

(mutagenesis) $ transvar --version
TransVar Version 2.3.4.20161215

(mutagenesis) $ python --version
Python 3.6.6 :: Anaconda, Inc.
zwdzwd commented 6 years ago

Hi @grayfall ,

Could you try the latest version? I tried with your Q99418 example and it seems to be working.

$ transvar panno -i 'Q99418:p.E156D' --uniprot --ccds
input   transcript      gene    strand  coordinates(gDNA/cDNA/protein)  region  info
Q99418:p.E156D  CCDS12722 (protein_coding)      CYTH2   +       chr19:g.48977195G>C/c.468G>C/p.E156D    inside_[cds_in_exon_6]  CSQN=Missense;reference_codon=GAG;candidate_codons=GAC,GAT;candidate_snv_variants=chr19:g.48977195G>T;source=CCDS
Q99418:p.E156D  CCDS12722 (protein_coding)      CYTH2   +       chr19:g.48977195G>C/c.468G>C/p.E156D    inside_[cds_in_exon_6]  CSQN=Missense;reference_codon=GAG;candidate_codons=GAC,GAT;candidate_snv_variants=chr19:g.48977195G>T;source=CCDS
$ transvar config
Reference version: hg19
$ transvar --version
TransVar Version 2.4.0.20180701
$ python --version
Python 3.6.3 :: Anaconda, Inc.
grayfall commented 6 years ago

@zwdzwd yes, the new release seems to work fine. Thank you.