Closed kimrutherford closed 7 months ago
@kimrutherford I've added code to update the gene name in the allele name, and to update the gene name (rangeDisplayName
) and primary identifier (rangeValue
) in annotation extensions.
I've tested this with UniProtKB accession numbers in PHI-Canto and it seems to work fine, but you'll have to test it with PomBase / Chado as the gene source because I don't know whether my changes are safe to use with those.
Specifically, I'm assuming there will only ever be one result returned from UniProtKB in the $from_id_lookup_result
and $to_id_lookup_result
, which allows me to simplify the code for getting the old and new gene names to this:
my $old_name = $from_id_lookup_result->{found}->[0]->{primary_name};
my $new_name = $to_id_lookup_result->{found}->[0]->{primary_name};
But I don't know whether that assumption holds for PomBase, or if it even holds for UniProtKB. I guess a safer solution would be to iterate through the results and find the first result where the primary identifier matches the $from_id
(then do the same for $to_id
), then get the new gene name from that result. I couldn't figure out how to do this at the time though.
Hi James.
Thanks very much for those changes. It all looks good to me. I'm going to merge the PR. We can added any fixes to the main branch.
But I don't know whether that assumption holds for PomBase, or if it even holds for UniProtKB.
It's not going to be a problem for PomBase because our lookup code will only return a single gene for a systematic ID.
I think it's OK for UniProt too since we're looking up accessions.
The web service used for the UniProt lookup is configured in canto.yaml
:
webservices:
uniprot_batch_lookup_url: 'https://rest.uniprot.org/uniprotkb/search?format=xml&query='
It all looks good to me. I'm going to merge the PR.
Great, thanks again for your help.
Refs pombase/canto#2677