lnccbrown / HSSM

Development of HSSM package
Other
70 stars 10 forks source link

[Bug] Check if `_pre_check_data_sanity` permits n-choice scenarios #427

Closed AlexanderFengler closed 1 month ago

AlexanderFengler commented 1 month ago

In the context of running models that allow choice options of [0, 1, 2, 3], I ran into an error due to _pre_check_data_sanity() which reads as:

ValueError: The response column must contain only -1 and 1 when there are two responses.

This is likely a corner case since it does not always happen.

digicosmos86 commented 1 month ago

This is indeed a corner case - this check is only performed when there are two choices in the data (len(np.unique(data["response"])) == 2), which indicate that even though there are supposed more than two choices, only two unique values were found in the data.

This example shows that inferring the number of choices from data is unreliable. In multiple-choice cases certain choices can be legitimately missing. The most robust solution to this is to require the users to provide the number of choices (but also leave a default at 2 to be compatible with existing code), and we can warn the users of potential missing choices in their data and raise errors when illegal choices show up