reproduction issue on assistment2009

Thank you for sharing a masterpiece framework.

I ran various models on the nips_task34 dataset, and the results were well reproduced.

However, on the assistment2009 dataset, I get about 5% better performance on average than reported in the pyKT and QIKT papers, etc. (e.g., DKT and SparseKT-topk achieve about 0.82 and 0.83 val auc, respectively.)

Is there something I'm missing?

Hi~thank you for your attention to our work. Since the DKT and sparseKT models are trained in KC-level sequences. Maybe the results you reported are the results at the KC level. Were you run the evaluation via our codes? If that, you may get late fusion-average results (e.g. "windowauclate_mean/windowacclate_mean") which are our reported results. For more details about our evaluation protocol, you can see our paper(https://arxiv.org/abs/2206.11460).

pykt-team / pykt-toolkit

reproduction issue on assistment2009 #128