Closed AlanSimmons closed 7 months ago
Currently, UBKG only contains HGNC->UNIPROTKB mappings for proteins for which curation has been reviewed (i.e., SwissProt). TrEMBL curations are not ingested.
@computationdoc
The current UNIPROTKB ETL maps as follows:
UniProt field | UBKG entity:property |
---|---|
Entry | Code node with CodeID=UNIPROTKB:Entry |
Entry Name | Term node with name=Entry Name and relationship PT |
Protein Names | Definition node with DEF=Protein Names, linked to the Code's associated Concept node. |
The function is actually a type of definition; however, the Definition node is already being used for the Protein Names field.
The UNIPROTKB ETL currently does not assign synonyms--i.e., Term nodes with relationship SYN to the code, based on the values in the node_synonyms field in the node_metadata file. The script could export the function field to the node_synonyms field.
Option 1. Currently, only UNIPROTKB codes have both definitions and functions in UBKG.
Example. The function is the term of type SY
The image is an annotated screen capture of a UniProtKB detail page, describing the changes to make in UBKG.
UniProtKB stores names and synonyms for proteins in the Protein Names field of the downloaded file. The field is delimited with parentheses--e.g., Approved Name (1st synonym)(2nd synonym) etc.
The script:
In addition, the script uses the value of the Entry Name field from the download as a synonym.
Example showing the approved name, all synonyms, and the definition.
Request:
Add the "Function" field to the data downloaded from UniProtKB.