Line 144 of cli.py's `evaluate` function throws an error when tested with scored predictions on squad data

nd-ball / py-irt

Bayesian IRT models in Python

MIT License

115 stars 41 forks source link

Line 144 of cli.py's `evaluate` function throws an error when tested with scored predictions on squad data #19

Closed pk1130 closed 2 years ago

pk1130 commented 2 years ago

Hey @jplalor @EntilZha! When testing all the functions written in cli.py with the scored predictions of the squad data, train and train_and_evaluate work well producing the desired training and evaluation results. But when testing the evaluate function separately by passing in the squad.jsonlines file for TEST_PAIRS_PATH, the function throws the following error on Line 144:

KeyError: item_id is not a key in the dict

Am I passing in the wrong file into TEST_PAIRS_PATH? Have also opened issue #18 to clarify what TEST_PAIRS_PATH refers to. Please respond at your earliest convenience. Thanks a lot!

EntilZha commented 2 years ago

I can't debug right now, but we should probably make sure we include unit/integration tests for new features to avoid issues like this as well as give some usage examples.

@pk1130 if you have time, it would make for a great PR to debug the issue, create a unit/integration test that reproduces the issue, and create a fix for it.

pk1130 commented 2 years ago

Would love to help out! I'll take a look at it when I wake up tomorrow morning :) Just to confirm @EntilZha, in place of the argument TEST_PAIRS_PATH I have to pass in the squad.jsonlines file path right?

jplalor commented 2 years ago

The data format for evaluate is slightly different than the squad example as implemented. The expectation is that you have:

A set of learned subject and item parameters
(subjectID, itemID) pairs that you want to estimate p(correct) for.

TEST_PAIRS_PATH refers to a new file of jsonlines that have the format:

{"subject_id": "s1", "item_id": "q1"}
{"subject_id": "s2", "item_id": "q1"}
{"subject_id": "s1", "item_id": "q3"}

This way you only get predictions for subject-items pairs that are specified (e.g., a previously held out test set).

jplalor commented 2 years ago

Closing this as I think we're all set.