riiid / ednet

EdNet is the dataset of all student-system interactions collected over 2 years by Santa, a multi-platform AI tutoring service with more than 780K users in Korea available through Android, iOS and web.
254 stars 53 forks source link

Request for clarification: "elapsed time" #5

Open yongyi-wu opened 3 years ago

yongyi-wu commented 3 years ago

Hello,

In KT1, does the elapsed_time correpond to the prior question, as the case in the Kaggle competition, or to the current question? Also, is it the average time spent on questions from the same bundle?

Specifically, could you explain why the lag_time computed below can contain negative values:

import pandas as pd

df = pd.read_csv('u42.csv')
bundle_size = dict(df.groupby('solving_id').size())

if not df['timestamp'].is_monotonic_increasing: 
    df = df.sort_values('timestamp')

df['lag_time'] = df.apply(
    lambda r: 
        0 if r.name == 0 or r['solving_id'] == df.loc[r.name - 1, 'solving_id'] 
        else r['timestamp'] - (df.loc[r.name - bundle_size[r['solving_id'] - 1], 'timestamp'] + df.loc[r.name - 1, 'elapsed_time'] * bundle_size[r['solving_id'] - 1]), 
    axis=1
).squeeze()

print(df[df['lag_time'] < 0])

Thank you.