probcomp / cgpm

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Apache License 2.0
25 stars 11 forks source link

patch to loomcat #247

Open Schaechtle opened 6 years ago

Schaechtle commented 6 years ago

This patch is needed since the column names I am getting from loom while calling _update_state (and _retrieve_featureid_to_cgpm, specifically) don't follow the assumptions made in the code. In fact, the column names I am getting from loom are the actual natural language column names which seems correct (thus patching cgpm).

I have a test case for this patch which depends on loom and bql -- I initially planned to include the test case here (i.e the cgpm repo). Looking at the test suite however, I am not sure anymore if this makes sense (not much loom and BQL around in cgpm/test); I am now planning to include this test in another repo focussed only on exporting models from loom. Open for suggestions here.

Tested using the probcomp dev image as folllows:

 2018-08-13 14:47:24 ⌚  ulli-notebook in ~/cgpm
○ → pytest tests/

Result:

454 passed, 8 skipped, 2 xfailed in 363.28 seconds
fsaad commented 6 years ago

The loomcat conversion is assuming that the loom project on disk was created using loomcat.initialize(state), in which case the loom column names are of the form c%05d. This encoding scheme is needed since cgpm does not have a notion of natural language column names, only column numbers.

When the loom project on disk was created using BayesDB's loom_backend class instead of cgpm's loomcat.initialize method, the loom column names are the natural language column names since these are known to BayesDB.

The assumptions in the code should be further loosened by: