Closed linchuanghong closed 3 months ago
The assistment2017 dataset (anonymized_full_release_competition_dataset.csv) does not include concepts. Why does the data set generated after "data_preprocess.py" have concepts?
Hi, the original datasheet (anonymized_full_release_competition_dataset.csv) of assistment2017 includes "skill" which can be treated as corresponding concepts.
Hi,But the skill in the raw data table is not an id(not a number of type int, but the name of the skill)? Why is concept an id of type int after preprocessing?
During the data preprocessing, we will map the question, kc contents, etc into new ids. As seen in pykt-toolkit/pykt/preprocess /split_datasets.py line471.
Thanks for reply!
Are there duplicate sequences in “train_valid_sequences” and “train_valid_quelevel” data sets?
Can you explain the difference between the preprocessed data files?
------------------ 原始邮件 ------------------ 发件人: "pykt-team/pykt-toolkit" @.>; 发送时间: 2023年7月21日(星期五) 下午2:56 @.>; @.**@.>; 主题: Re: [pykt-team/pykt-toolkit] Problems with the assistment2017 dataset (Issue #124)
During the data preprocessing, we will map the question, kc contents, etc into new ids. As seen in pykt-toolkit/pykt/preprocess /split_datasets.py line471.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Can you explain the difference between the preprocessed data files? … ------------------ 原始邮件 ------------------ 发件人: "pykt-team/pykt-toolkit" @.>; 发送时间: 2023年7月21日(星期五) 下午2:56 @.>; @.**@.>; 主题: Re: [pykt-team/pykt-toolkit] Problems with the assistment2017 dataset (Issue #124) During the data preprocessing, we will map the question, kc contents, etc into new ids. As seen in pykt-toolkit/pykt/preprocess /split_datasets.py line471. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Thank you for your interest in our work. After data preprocessing, we get train_valid.csv and train_valid_sequences.csv which are the samples before and after truncation respectively. The filename with "quelevel" are the data files for question level based KT models such as iekt, qikt and lpkt. For the testing set, we have additional window files which use the nearest N historical interactions to predict the student performance on the next question.
How do I change the Embedding size of the model? Why does it still not work when I modify the corresponding model in the kt_config.json file?
How do I change the Embedding size of the model? Why does it still not work when I modify the corresponding model in the kt_config.json file?
Sorry for the late reply. Can you provide the related modified codes about the seqlen ?
I have already clarified this issue, thank you for your reply. Now I want to know what max_concepts=7 means in the algebra2005 dataset in the data_config.py file?
Basically, there are some questions associated with multiple knowledge concepts (KCs) in educational datasets. Hence, we calculate the largest number of KCs of a question in each dataset denoted as "max_concepts".
The assistment2017 dataset (anonymized_full_release_competition_dataset.csv) does not include concepts. Why does the data set generated after "data_preprocess.py" have concepts?