nih-cfde / cfde-deriva

Collaboration point for miscellaneous CFDE-deriva scripts
Other
2 stars 3 forks source link

Update search box to match Genes and PubChem terms #323

Closed jrchudy closed 2 years ago

jrchudy commented 2 years ago

Once changes to support multiple sources is merged in ermrestJS, the search box definition needs to be updated to include searching Genes and PubChem terms.

Issue in ermrestJS related to this issue: https://github.com/informatics-isi-edu/ermrestjs/issues/915

Related to this issue, these 2 sets of data are not in level 1 stats yet so they can’t easily be included in static pages’ plots. Also not in core fact either.

karlcz commented 2 years ago

marking this "in progress" just because I think the chaise parts are under way...

karlcz commented 2 years ago

It turns out that the elegant approach supported by this new Chaise feature puts us back into very unstable query plan performance from ermrest+postgresql. So, I'm only using it for the collection table which is small.

For the others, I'm reintroducing another layer of materialized+indexed keywords between the main entities and the narrower fact tables (core, gene, pubchem, and protein). This makes for a simpler query plan over one indexed keywords source and fewer joins.

This new layer combined_fact represents combinations of (core, gene, pubchem, protein) facts and so is a bit bloated like the original core facts before we refactored it. However it only stores the 4 numeric fkeys to those other facts rather than all the verbose fact coordinate content of the other tables. (No term arrays, no denormalized term record JSON blobs.) So, it is more compact except for the unavoidable keyword material itself. These combined facts also carry their own c2m2 entity statistics, which will be important in the future if we want to chart with dimensions that cross the different fact tables, e.g. taxonomy X gene or anatomy X protein.

This is deployed as a new interim "public" release in app-dev to allow some testing of the revised search. Note, other aspects of the UI may see regressions in this work-in-progress.