Closed andrewhercules closed 2 years ago
Thanks @andrewhercules, I've moved the mouse phenotype section to a new ticket as a) it's a separate index (internally), and b) those issues existed on the current dataset. I think that @d0choa had mentioned that we need to rewrite MP, so that can maybe a starting point for those.
Please keep adding issues here and I'll triage as they come in. :)
@andrewhercules pathway was removed from the schema and is no longer available. There was a comment that we could potentially get this from 'reactome pathways' if necessary (comment on target refactor spreadsheet iteration1
, but it hasn't been followed up as yet. I think the broad plan was to introduce a new 'geneSets' index of some sort using reactome data as a base. @d0choa will have a better idea.
my memory might fail but I think the Reactome pathways (R-HSA-XXX) in the new implementation were coming as xrefs directly from Ensembl? Could you check that field?
Resolving the ids into labels was done by @mkarmona for the facets.
my memory might fail but I think the Reactome pathways (R-HSA-XXX) in the new implementation were coming as xrefs directly from Ensembl? Could you check that field?
Resolving the ids into labels was done by @mkarmona for the facets.
I have checked the dbXrefs
field and it only returns the Pathway ID and source. Currently, we display the pathway name and top-level parent pathway.
Regarding comparative genomics:
I think we're still missing a couple of entries that we should have though, so I'll keep digging.
Ticking off the missing 'rat' entry on TNF as the second entry is low confidence so we're excluding it intentionally:
+---------------+-----------------+-----------------------+------------------+
| gene_stable_id| homology_species|homology_gene_stable_id|is_high_confidence|
+---------------+-----------------+-----------------------+------------------+
|ENSG00000232810|rattus_norvegicus| ENSRNOG00000055156| 0|
|ENSG00000232810|rattus_norvegicus| ENSRNOG00000000837| 1|
+---------------+-----------------+-----------------------+------------------+
Regarding ESR1:
+--------------------+
| homology_species|
+--------------------+
| mus_caroli|
|mustela_putorius_...|
| mus_spretus|
| mus_pahari|
| mus_spicilegus|
+--------------------+
whereas we're only looking for mus_musculus
as specified in opentargets/platform#1047 (10090 is mouse). @d0choa is this something that we need to investigate further or do we assume the new data is actually the data we want?
Regarding protein information (subcellular location), taking the example of APP (ENSG00000142192)
+---------------+---------------------------------------------------+
|id |location |
+---------------+---------------------------------------------------+
|ENSG00000142192|Cell membrane ; Single-pass type I membrane protein|
|ENSG00000142192|Cell projection, growth cone |
|ENSG00000142192|Cytoplasm |
|ENSG00000142192|Cytoplasmic vesicle |
|ENSG00000142192|Early endosome |
|ENSG00000142192|Golgi apparatus |
|ENSG00000142192|Membrane ; Single-pass type I membrane protein |
|ENSG00000142192|Membrane, clathrin-coated pit |
|ENSG00000142192|Perikaryon |
|ENSG00000142192|Vesicles |
|ENSG00000142192|[Amyloid-beta protein 42]: Cell surface |
|ENSG00000142192|[Gamma-secretase C-terminal fragment 59]: Nucleus |
|ENSG00000142192|[Soluble APP-beta]: Secreted |
+---------------+---------------------------------------------------+
All of these entries come from Uniprot flat files which we parse in the ETL. The documentation says that:
location ; topology ; orientation
This is formally defined as:
The format of SUBCELLULAR LOCATION is:
CC -!- SUBCELLULAR LOCATION:(( Molecule:)?( Location\.)+)?( Note=Free_text( Flag)?\.)?
Where:
Molecule: Isoform, chain or peptide name
Location = Subcellular_location( Flag)?(; Topology( Flag)?)?(; Orientation( Flag)?)?
Subcellular_location: SL-line of subcell.txt ID-record
Topology: SL-line of subcell.txt IT-record
Orientation: SL-line of subcell.txt IO-record
In the current Platform:
Membrane, clathrin-coated pit
become clathrin-coated pit
In short, we've got a bit of a mix of both the current Platform and Uniprot. Any thoughts @andrewhercules and @d0choa on which way we want to go?
ChemicalProbes is yet to be implemented, see (opentargets/platform#1389). It's with @ireneisdoomed at present. Once she's back from leave she can provide an update.
Regarding Protein Information: the schema has changed so that evidence
is a simple string. I've tried adding in eco codes as described in opentargets/platform#1037 but they are quite sparse. I'll revisit it again later. Unless I hear otherwise I'll assume that they are still a low priority (as mentioned in the linked ticket).
If nothing else the endpoint should be working!
I've temporarily removed targetSafety
from the API as it's broken and unstable. This way other errors that are found will be more likely to be genuine errors.
mousePhenotypes
schema updates (https://github.com/opentargets/platform/issues/1471)Comparison between the old and new schema is available in this spreadsheet (first four columns). Summary:
targetInModelEnsemblId
and biologicalModelId
.Also CCing @ireneisdoomed @DSuveges to keep them in the loop.
This dataset describes association between target and disease when a cancer biomarker is found.
Because of this particularity of the presence of the biomarker, it was initially scoped to be part of the target annotations and not part of the evidence.
We have modelled and parsed the table so that the cancer biomarkers are a new data source of evidence for the upcoming release (PR #89)
The schema that this source follows is tracked here: https://docs.google.com/spreadsheets/d/1Mowq7KsGTMtEg3wZpJBNK_UbawHKJeM9d0syT9F9AMc/edit?usp=sharing
Therefore:
Back-end actions (CC @JarrodBaker)
cancerBiomarkers
API endpointFront-end actions (CC @andrewhercules)
@ireneisdoomed, I have updated #1645 and have asked the front-end team to remove the Cancer Biomarkers summary widget and detail view from the target profile page.
When evidence details are ready, including the relevant fields and datasource name (e.g. Cancer Genome Interpreter) please let me know and I can create design specifications for a summary widget and detail view for the evidence page. The front-end team will also make sure the filters on the association page are updated. And we can adjust the documentation and include the new source on the evidence page.
I'm also CCing @HelenaCornu as we will want to explain this in our release comms if the change is ready for 21.09 :-)
Until now, chemical probes were generated by parsing the spreadsheet that the data team maintained and curated manually. From 21.09 on, Chemical Probes and information about the ProbeMiner score will come from probes-drugs.org, the integration of this resource being described in #1536.
The proposed model with all the different endpoints can be seen here (current version is iter3): https://docs.google.com/spreadsheets/d/1AqC6aqKgyf_s-R1LculocodpjcHRMw6t-pUoVsvryxs/edit?usp=sharing
The latest version of the output dataset is uploaded to the path we were using for ProbeMiner data (gs://otar001-core/ProbeMiner/annotation
). I propose renaming the parent directory to accommodate
Actions:
@ireneisdoomed just confirming relating to cancer biomarkers and your comment on 9 August: you don't want any cancer biomarker information to be available via the API?
Ticket closed as target
part of the 21.09
release
This ticket captures preliminary findings after reviewing the development API with the new
target
ETL data.19 July 2021: ticket still a work-in-progress as review ongoing
Mouse Phenotypes
Issues with MP should be compiled on a specific MP ticket.
Pathways
Comparative Genomics
queryPercentageIdentity
) and "Target %id" (targetPercentageIdentity
) are different compared to the UI data table that relies on a direct call to the Ensembl API (e.g. CFTR)Protein Information
Target Safety
tissue.label
values of "placenta", "Intestine", "breast", while the whiletissue.modelName
ishepatocyte
)TEPs
Cancer Hallmarks
Target tractability
Cancer Biomarkers
Gene Ontology
Chemical Probes
Baseline expression
Target classes