Closed ValWood closed 7 months ago
It looks like these are separate lines in the HPOA genes_to_disease file based on the disease_id column:
ncbi_gene_id gene_symbol hpo_id hpo_name frequency disease_id
164 AP1G1 HP:0000718 Aggressive behavior 2/3 OMIM:619548
164 AP1G1 HP:0000718 Aggressive behavior 4/8 OMIM:619467
We should find a way to represent that in the UI. Maybe there's an appropriate qualifier or evidence field
Issue #430 is meant to generally address the idea of collapsing/grouping edges, but even if that were implemented, this particular example would still end up looking odd.
The only difference between the two rows in the incoming file are frequency counts, so we want to be able to show that.
I'm going to move this forward to November and assign it to myself. Another related bug/new feature that I'm seeing is that we aren't including frequency when it's represented by an HP term, like:
ncbi_gene_id gene_symbol hpo_id hpo_name frequency disease_id
113179 ADAT3 HP:0000718 Aggressive behavior HP:0040284 OMIM:615286
113179 ADAT3 HP:0000718 Aggressive behavior HP:0040283 ORPHA:363528
Personally, I think it could be a bit misleading to log the mentions in different resources as different evidences because quite often this will come from the same source.
here the 2 sources are equivalent, this phenotype is associated with the disease description. https://rarediseases.oscar.ncsu.edu/disease/intellectual-disability-strabismus-syndrome/about/
Using the current method will begin to represent the number of resources, rather than the support for the phenotypic observation.
I probably didn't explain this very well. But I'm not sure this is what I would expect "frequency" to mean
CC @cmungall
Sorry, by frequency I mean as used in the HPO Annotation files
https://hpo.jax.org/app/data/annotation-format
Frequency A term-id from the HPO-sub-ontology below the term Frequency.
There are three allowed options for this field. A term-id from the HPO-sub-ontology below the term Frequency. A count of patients affected within a cohort. For instance, 7/13 would indicate that 7 of the 13 patients with the specified disease were found to have the phenotypic abnormality referred to by the HPO term in question in the study referred to by the DB_Reference A percentage value such as 17%, again referring to the percentage of patients found to have the phenotypic abnormality referred to by the HPO term in question in the study referred to by the DB_Reference. If possible, the 7/13 format is preferred over the percentage format if the exact data is available.
I agree though, one of the reasons that we have fewer sources now is to avoid just that problem of getting the same original annotations echoing in from multiple sources.
The frequency property is included in the graph now, and successfully showing for phenotype associations on genes and diseases, but I just checked and realized that on https://beta.monarchinitiative.org/HP:0000718 we're not seeing frequency.
We need to expand the check to being any time that "Phenotypes" are selected for the association OR this_node.category == biolink:PhenotypicFeature
I think we should make this fix before releasing
Ok, closing now, since I see frequency on https://beta.monarchinitiative.org/HP:0000718
I didn't see it for Pombase phenotypes (e.g cdc2). should I?
@ValWood Oh! I was focused on representing the phenotype.hpoa frequency column. I'll make a new issue for capturing the phaf penetrance column as biolink frequency qualifier / quantifier fields.
Some of the "top 5 phenotypes" are duplicates:
That's a lot of space to say : "[Aggressive behavior (HPO)] evidence (2)
Is it necessary to repeat the gene every time when you are on the gene page?