wzfhaha / dropout_prediction

28 stars 10 forks source link

Concrete definition of dropout? #2

Open pmixer opened 4 years ago

pmixer commented 4 years ago

Hi @wzfhaha , to make things clear, may you pls elaborate on the details of definition of dropout for KDDCup2015 and AAAI2019 dataset separately:

  1. For KDDCup2015 dataset, it's claimed if user do not have records for 10 days after the last day of class, he/she is considered as dropout, it's bit wried as if someone finished the course, he/she may not visit it in the next 10 days but may still get the certificate(taken as completion rather than dropout), and one-month seems shorter than the whole semester, we u explain it(sorry I started taking courses on XuetangX since Sep 2014, hard to understand what happened before that)?

  2. For AAAI2019 dataset, students do have have activity record during the prediction period are taken as dropout from the course, that is reasonable, but how do you decide which period to be taken as prediction preriod, especilly for SPM courses?

  3. Are these non-dropouts all taken as completion whether that get the certificate or not? Or dropout is sometimes defined by whether the student get the certificate or not?

  4. The dropout rate is below 95% according to the statistics listed in the paper, are these users randomly selected from all users of XuetangX weighted by user activity frequency(so more active users got selected)?

Thx for making the dataset,paper and code publically avaiable!

Regards, Zan

wzfhaha commented 4 years ago

Hi, @pmixer. Thanks for your question.

  1. There are various different definitions for the "dropout" in MOOCs. It can also be defined based on the certificate behavior. However, in practice, MOOCs' researchers usually pay more attention to the dropout behavior of the earlier period of a semester, such that they can take some interventions to retain those users with higher dropout risks. That is the motivation of the dropout definition in our paper.

  2. For IPM courses, history period is defined as the first five weeks after course start, and prediction period is the first ten-day after history period (plz refer to the section of "Implementation Details" in paper). As for SPM courses, the "dropout" behavior has not been well-defined since it does not have a fixed start or end time. The analysis and methods of our paper are mainly for the IPM courses.

  3. The "dropout" definition in our paper is different from the completion. Dropout only represents a user has no activities in the prediction period.

  4. The users are randomly selected in our analyses. The reason for dropout rate below 95% is the different definitions between dropout and completion.

pmixer commented 4 years ago

@wzfhaha THX for your great answers! It's very clear and detailed!

The last question is as mentioned dropout definition SPM courses is kind of vague, could u pls elaborate more on how its defined in CFIN? (like 1st 5 weeks after registeratioin is defined as history period and next 10 days as prediction period, as I didn't see IPM and SPM separated in training/testing sets?) (moreover, does this means researchers can have their own definition of dropout as long as having raw tracking log?)

wzfhaha commented 4 years ago

@pmixer, all the courses used in CFIN are IPM courses. You are right, researchers can use different definition of dropout.