rschmucker / Large-Scale-Knowledge-Tracing

Efficient implementations for handling large-scale student log data and knowledge tracing algorithms.
33 stars 8 forks source link

Question: File task_1_answer_id_ordering.npy referenced by the Eedi preparation script does not exist #1

Open mdgbayly opened 2 years ago

mdgbayly commented 2 years ago

https://github.com/rschmucker/Large-Scale-Knowledge-Tracing/blob/master/src/preparation/datasets/eedi.py#L195

This file task_1_answer_id_ordering.npy does not exist in the dataset. Maybe it was created out of band? From the code it looks like provides some kind of indexing to AnswerId and is used to sort the interactions prior to setting up splits and serializing to csv.

Not sure how important it is? The interaction data is also sorted by user_id, timestamp, so maybe it could just be dropped from the sorting? Or perhaps it can be recreated if the strategy behind it is known.

rschmucker commented 2 years ago

The Eedi dataset created for the NeurIPS2020 competition (https://eedi.com/projects/neurips-education-challenge) contains a DateAnswered attribute which describes the time of each student response rounded to the nearest minute. Because the timestamp values are rounded the Eedi dataset does not allow to determine the exact question response order in cases where a student responded to multiple questions in the same minute. Based on our checks about 64% of the student responses are affected by this problem.

Upon request the dataset author Angus Lamb (t-anlam@microsoft.com) provided us with the task_1_answer_id_ordering.npy file which our code uses to determine the exact student response sequence.

mdgbayly commented 2 years ago

Upon request the dataset author Angus Lamb (t-anlam@microsoft.com) provided us with the task_1_answer_id_ordering.npy file which our code uses to determine the exact student response sequence.

Thanks for the response. Are you able to upload the file or was it deemed proprietary in some way and I would need to ask Angus for it myself?

rschmucker commented 2 years ago

I sent an email to Angus to ask if they can share the file with the wider community. I will let you know after I heard back.

mdgbayly commented 2 years ago

Thanks so much.

Incidentally, just wanted to give you kudos for your paper and this repo.

The quality of your paper and the associated code as a learning resource for others is far beyond anything else I have seen in this area of research. I realize that's not your primary objective, but regardless you have done an amazing job. The paper clearly describes the motivation and approach in a way that is easy for a non-academic non-researcher like myself to finally grok. And your experimental methodololgy is easy to follow and reproduce with your code. And your code is so detailed and well organized. My day job is as a web app developer, and I'm blown away by how well written the code is. And then on top off all that you have some really creative ideas to push the field forward.

Best of luck with your PhD, research and future endeavours.

makoeppel commented 1 year ago

Was there any answer from Angus Lamb? I am also wanted to checkout the project and the file task_1_answer_id_ordering.npy is still missing.

rschmucker commented 1 year ago

I sent an email to Angus a while ago but did not receive a response. I am not sure if he is still affiliated with Microsoft. If anyone knows if they shared this file with the public somewhere please let me know so that I can link it in the README.