monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

Some of the "top 5 phenotypes" are duplicates: plus repeating data type every time. #328

Closed ValWood closed 7 months ago

ValWood commented 1 year ago

Some of the "top 5 phenotypes" are duplicates:

Screenshot 2023-09-11 at 19 08 33

That's a lot of space to say : "[Aggressive behavior (HPO)] evidence (2)

Is it necessary to repeat the gene every time when you are on the gene page?

glass-ships commented 1 year ago

https://monarchinitiative.org/HGNC:555#associations

kevinschaper commented 1 year ago

It looks like these are separate lines in the HPOA genes_to_disease file based on the disease_id column:

ncbi_gene_id    gene_symbol hpo_id  hpo_name    frequency   disease_id
164 AP1G1   HP:0000718  Aggressive behavior 2/3 OMIM:619548
164 AP1G1   HP:0000718  Aggressive behavior 4/8 OMIM:619467

We should find a way to represent that in the UI. Maybe there's an appropriate qualifier or evidence field

kevinschaper commented 12 months ago

Issue #430 is meant to generally address the idea of collapsing/grouping edges, but even if that were implemented, this particular example would still end up looking odd.

The only difference between the two rows in the incoming file are frequency counts, so we want to be able to show that.

kevinschaper commented 12 months ago

I'm going to move this forward to November and assign it to myself. Another related bug/new feature that I'm seeing is that we aren't including frequency when it's represented by an HP term, like:

ncbi_gene_id    gene_symbol hpo_id  hpo_name    frequency   disease_id
113179  ADAT3   HP:0000718  Aggressive behavior HP:0040284  OMIM:615286
113179  ADAT3   HP:0000718  Aggressive behavior HP:0040283  ORPHA:363528
ValWood commented 12 months ago

Personally, I think it could be a bit misleading to log the mentions in different resources as different evidences because quite often this will come from the same source.

here the 2 sources are equivalent, this phenotype is associated with the disease description. https://rarediseases.oscar.ncsu.edu/disease/intellectual-disability-strabismus-syndrome/about/

Using the current method will begin to represent the number of resources, rather than the support for the phenotypic observation.

I probably didn't explain this very well. But I'm not sure this is what I would expect "frequency" to mean

CC @cmungall

kevinschaper commented 12 months ago

Sorry, by frequency I mean as used in the HPO Annotation files

https://hpo.jax.org/app/data/annotation-format

Frequency A term-id from the HPO-sub-ontology below the term Frequency.

There are three allowed options for this field. A term-id from the HPO-sub-ontology below the term Frequency. A count of patients affected within a cohort. For instance, 7/13 would indicate that 7 of the 13 patients with the specified disease were found to have the phenotypic abnormality referred to by the HPO term in question in the study referred to by the DB_Reference A percentage value such as 17%, again referring to the percentage of patients found to have the phenotypic abnormality referred to by the HPO term in question in the study referred to by the DB_Reference. If possible, the 7/13 format is preferred over the percentage format if the exact data is available.

kevinschaper commented 12 months ago

I agree though, one of the reasons that we have fewer sources now is to avoid just that problem of getting the same original annotations echoing in from multiple sources.

kevinschaper commented 7 months ago

The frequency property is included in the graph now, and successfully showing for phenotype associations on genes and diseases, but I just checked and realized that on https://beta.monarchinitiative.org/HP:0000718 we're not seeing frequency.

We need to expand the check to being any time that "Phenotypes" are selected for the association OR this_node.category == biolink:PhenotypicFeature

I think we should make this fix before releasing

kevinschaper commented 7 months ago

Ok, closing now, since I see frequency on https://beta.monarchinitiative.org/HP:0000718

ValWood commented 7 months ago

I didn't see it for Pombase phenotypes (e.g cdc2). should I?

kevinschaper commented 7 months ago

@ValWood Oh! I was focused on representing the phenotype.hpoa frequency column. I'll make a new issue for capturing the phaf penetrance column as biolink frequency qualifier / quantifier fields.