Open ThomasColthurst opened 1 month ago
Two further thoughts on this:
1) Once the IRM and HIRM classes are merged (as suggested in https://github.com/probcomp/hierarchical-irm/issues/97), we could have an EncodedHIRM class that is the current HIRM class that specifies domain entities using ints (aka T_items) and a new HIRM class that specifies domain entities using strings and handles the encoding / decoding but passes all the other work to an EncodedHIRM object.
2) Especially if we are going to have a pybind11 python interface to the code, there is no reason for the C++ not to provide a scikit-learn type interface, where models have constructors and .fit, .predict, and .score methods.
In fact, one nice thing we could do is have our HIRM class provide those methods (with X's = vectors of pair<relation name, list of string domain values>, and Y's = strings or ObservationVariants) but also support indexing, so that if my_hirm is of type HIRM, my_hirm['swims'] is also a model with a scikit-learn interface (just with the relation name dropped from the X's).
They have nothing to do with IO, and should really be methods of the IRM/HIRM classes.
And while we are refactoring util_io, we may want to turn T_schema into a proper class, with add_clean_relation, add_noisy_relation, and verify_noisy_relation_domains methods.