Open git-jemiller opened 5 years ago
Hi,
Sorry for the late response. TransVar has been using the ID mapping from uniprot. More specifically it's from this file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping.dat.gz
Therefore if your identifier isn't linked to any transcript id in this file, transvar wouldn't be able to locate transcript definition. That's what happened to W5XKT8 and Q6N069. There has also to be a match between the transcript ID from the id mapping file and the transcript definition used. You could also use a customized ID mapping if you know how to project Uniprot ID to transcript ID (Ensembl, Refseq etc). This is done by
transvar index --idmap <idmapping file> -o <output_idx>
idmapping file has two columns, the first being uniprot ID, the second being the transcript ID. once done you could use something like
transvar panno --idmap <output_idx>
as usual.
Let me know if you know a better way to map these IDs. Thanks!
I'm trying to annotate a protein with its genomic coordinates using transvar and for most proteins it works fine, but sometimes nothing is returned except for the header of the output. How should I interpret this result? Or am I doing something wrong?
Also, why do some proteins need their isoform to get any output and others do not?
Here's an example:
Thanks!