Closed emilyfertig closed 1 month ago
Adding Alex to answer the question of whether we want to make the old behavior optional with a flag.
I ask because it seems like currently the GenDB doc treats clean and noisy relations a bit asymmetrically in terms of what domains they can condition on -- the clean relations get the maximal set of available domains, but the noisy relations get the minimal set. I was curious as to whether this asymmetry was intentional, and if so whether it was motivated by (a) computational concerns [i.e., there are typically a lot more noisy relations than clean relations], (b) world modelling concerns [i.e., it makes sense that Physician names might vary from state to state, but the typo function applied to the name of the Physician's med school probably doesn't vary much from state to state], (c) compatibility with how PClean did things, or (d) other.
For example, in the following block from
schema_helper_test
,test_make_hirm_schema
expected_domains
should be{"School", "Physician", "Record"}
since the DAG isSchool -> Physician -> Record <- Practice <- City
(and School, Physician are the input classes of the base relation, and Record is the only class that's the destination of a path starting at Physician)