pykt-team / pykt-toolkit

pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models
https://pykt.org
MIT License
194 stars 53 forks source link

reproduction issue on assistment2009 #128

Closed changdaeoh closed 3 months ago

changdaeoh commented 10 months ago

Thank you for sharing a masterpiece framework.

I ran various models on the nips_task34 dataset, and the results were well reproduced.

However, on the assistment2009 dataset, I get about 5% better performance on average than reported in the pyKT and QIKT papers, etc. (e.g., DKT and SparseKT-topk achieve about 0.82 and 0.83 val auc, respectively.)

Is there something I'm missing?

sonyawong commented 10 months ago

Thank you for sharing a masterpiece framework.

I ran various models on the nips_task34 dataset, and the results were well reproduced.

However, on the assistment2009 dataset, I get about 5% better performance on average than reported in the pyKT and QIKT papers, etc. (e.g., DKT and SparseKT-topk achieve about 0.82 and 0.83 val auc, respectively.)

Is there something I'm missing?

Hi~thank you for your attention to our work. Since the DKT and sparseKT models are trained in KC-level sequences. Maybe the results you reported are the results at the KC level. Were you run the evaluation via our codes? If that, you may get late fusion-average results (e.g. "windowauclate_mean/windowacclate_mean") which are our reported results. For more details about our evaluation protocol, you can see our paper(https://arxiv.org/abs/2206.11460).