Closed karlcz closed 2 years ago
A revision to split out gene_fact
and pubchem_fact
tables has been pushed. It has been manually tested on a few submissions in dev, but needs full end-to-end testing via submission pipeline and browser.
A test build on app-dev shows more reasonable fact table sizes, with around 10k facts for 3M files. The pubchem facts are most numerous, due to the LINCS assays with many distinct compounds.
As dimensions have been added to core fact, a new submission can generate too many permutations and erase the performance improvements gained by pre-aggregating c2m2 entities into equivalence classes.
It seems that (
substances
,compounds
) and (genes
) may benefit from being split into separate fact tables.