probcomp / hierarchical-irm

Probabilistic structure discovery for rich relational systems
Apache License 2.0
2 stars 2 forks source link

Move incorporate_observations and encode_observations out of util_io #109

Open ThomasColthurst opened 1 month ago

ThomasColthurst commented 1 month ago

They have nothing to do with IO, and should really be methods of the IRM/HIRM classes.

And while we are refactoring util_io, we may want to turn T_schema into a proper class, with add_clean_relation, add_noisy_relation, and verify_noisy_relation_domains methods.

ThomasColthurst commented 3 weeks ago

Two further thoughts on this:

1) Once the IRM and HIRM classes are merged (as suggested in https://github.com/probcomp/hierarchical-irm/issues/97), we could have an EncodedHIRM class that is the current HIRM class that specifies domain entities using ints (aka T_items) and a new HIRM class that specifies domain entities using strings and handles the encoding / decoding but passes all the other work to an EncodedHIRM object.

2) Especially if we are going to have a pybind11 python interface to the code, there is no reason for the C++ not to provide a scikit-learn type interface, where models have constructors and .fit, .predict, and .score methods.

In fact, one nice thing we could do is have our HIRM class provide those methods (with X's = vectors of pair<relation name, list of string domain values>, and Y's = strings or ObservationVariants) but also support indexing, so that if my_hirm is of type HIRM, my_hirm['swims'] is also a model with a scikit-learn interface (just with the relation name dropped from the X's).