Closed changdaeoh closed 3 months ago
Thank you for sharing a masterpiece framework.
I ran various models on the nips_task34 dataset, and the results were well reproduced.
However, on the assistment2009 dataset, I get about 5% better performance on average than reported in the pyKT and QIKT papers, etc. (e.g., DKT and SparseKT-topk achieve about 0.82 and 0.83 val auc, respectively.)
Is there something I'm missing?
Hi~thank you for your attention to our work. Since the DKT and sparseKT models are trained in KC-level sequences. Maybe the results you reported are the results at the KC level. Were you run the evaluation via our codes? If that, you may get late fusion-average results (e.g. "windowauclate_mean/windowacclate_mean") which are our reported results. For more details about our evaluation protocol, you can see our paper(https://arxiv.org/abs/2206.11460).
Thank you for sharing a masterpiece framework.
I ran various models on the nips_task34 dataset, and the results were well reproduced.
However, on the assistment2009 dataset, I get about 5% better performance on average than reported in the pyKT and QIKT papers, etc. (e.g., DKT and SparseKT-topk achieve about 0.82 and 0.83 val auc, respectively.)
Is there something I'm missing?