Open ptgolden opened 9 hours ago
The issue is that the Solr query sorts by frequency_qualifier
, while the frequency bar is populated by has_count
and has_total
. This makes sense-- the former is an HPO qualifier, the latter HMIM-- but visually, it is not intuitive.
After a conversation with @amc-corey-cox, I see what's going on. Disease-phenotype associations from HPOA have either a categorical frequency_qualifier
OR discrete frequency values (either count/total
or percentage%
). See https://pmc.ncbi.nlm.nih.gov/articles/PMC6324074/table/tbl2/). When sorting associations by frequency in the API, only the categorical frequency is used. More precise measurements are not taken into account.
...Additionally, there is a bug when rendering the frequencies in the association table that was introduced in 400ab725226fe31b3f317b2c131941cf32289fe4. There's code to deal with all the different cases of frequency measurements, but this is how it checks them:
In the row
object, the absence is a value is marked by null
. All of those checks should be changed either to if (row.has_count !== null)
or if (row.has count != null)
(note this is using loose equality which is probably only ever properly used if you expect a value to be either undefined
or null
).
Since null !== undefined
is true, the first branch is taken in all of the comparisons. Hence a bunch of tools tips saying that frequency is "null of null cases".
Aaaand that derived frequency is already added to the index: https://github.com/monarch-initiative/monarch-ingest/blob/a4894f496633ed800b779f1b2df669b41838bd50/scripts/frequency_update_processor.js! The bug then is that the table should call a sort to the derived frequency rather than frequency_qualifier
.
LAST BUG and then this will be working fine. @kevinschaper, it seems that frequency_qualifier_label
is always null
, any idea why this is? See https://monarchinitiative.org/v3/api/entity/MONDO:0007523/biolink:DiseaseToPhenotypicFeatureAssociation?direct=false&limit=100&offset=0&query=&sort=frequency_computed_sortable_float+asc&traverse_orthologs=false for an example.
I imagine it's something to be added in the HPO ingest
Or, maybe not, since frequency_qualifier_label
is not a part of biolink:DiseaseToPhenotypicFeatureAssociation
Sorting by frequency in a disease to phenotype table can occasionally cause confusing and incorrect results. For an example, show 100 results at a time in the association table for MONDO:0007523 and try sorting by frequency. Neither ascending nor descending sort seems to do anything.
For an example with more associations with frequency values attached, try MONDO:0958235. By default, it is sorted by frequency. But after explicitly sorting by frequency by click on the table header, things get wonky.