monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

Sorting by frequency in disease to phenotype table not working correctly #910

Open ptgolden opened 9 hours ago

ptgolden commented 9 hours ago

Sorting by frequency in a disease to phenotype table can occasionally cause confusing and incorrect results. For an example, show 100 results at a time in the association table for MONDO:0007523 and try sorting by frequency. Neither ascending nor descending sort seems to do anything.

For an example with more associations with frequency values attached, try MONDO:0958235. By default, it is sorted by frequency. But after explicitly sorting by frequency by click on the table header, things get wonky.

ptgolden commented 8 hours ago

The issue is that the Solr query sorts by frequency_qualifier, while the frequency bar is populated by has_count and has_total. This makes sense-- the former is an HPO qualifier, the latter HMIM-- but visually, it is not intuitive.

ptgolden commented 7 hours ago

After a conversation with @amc-corey-cox, I see what's going on. Disease-phenotype associations from HPOA have either a categorical frequency_qualifier OR discrete frequency values (either count/total or percentage%). See https://pmc.ncbi.nlm.nih.gov/articles/PMC6324074/table/tbl2/). When sorting associations by frequency in the API, only the categorical frequency is used. More precise measurements are not taken into account.

...Additionally, there is a bug when rendering the frequencies in the association table that was introduced in 400ab725226fe31b3f317b2c131941cf32289fe4. There's code to deal with all the different cases of frequency measurements, but this is how it checks them:

https://github.com/monarch-initiative/monarch-app/blob/af5ab7bcd74e5bc18b4a0929d797b6758141a801/frontend/src/pages/node/AssociationsTable.vue#L388-L390

In the row object, the absence is a value is marked by null. All of those checks should be changed either to if (row.has_count !== null) or if (row.has count != null) (note this is using loose equality which is probably only ever properly used if you expect a value to be either undefined or null).

Since null !== undefined is true, the first branch is taken in all of the comparisons. Hence a bunch of tools tips saying that frequency is "null of null cases".

ptgolden commented 6 hours ago

Aaaand that derived frequency is already added to the index: https://github.com/monarch-initiative/monarch-ingest/blob/a4894f496633ed800b779f1b2df669b41838bd50/scripts/frequency_update_processor.js! The bug then is that the table should call a sort to the derived frequency rather than frequency_qualifier.

ptgolden commented 6 hours ago

LAST BUG and then this will be working fine. @kevinschaper, it seems that frequency_qualifier_label is always null, any idea why this is? See https://monarchinitiative.org/v3/api/entity/MONDO:0007523/biolink:DiseaseToPhenotypicFeatureAssociation?direct=false&limit=100&offset=0&query=&sort=frequency_computed_sortable_float+asc&traverse_orthologs=false for an example.

ptgolden commented 6 hours ago

I imagine it's something to be added in the HPO ingest

ptgolden commented 6 hours ago

Or, maybe not, since frequency_qualifier_label is not a part of biolink:DiseaseToPhenotypicFeatureAssociation