Open amueller opened 1 year ago
teachingAssistant has an ID attribute that should be ignored by default. This is particularly bad because there's an ordering in the data, so the ID is informative: In fact, ignoring everything but the ID gives near-perfect results:
cross_validate(RandomForestClassifier(), df[['ID']], df['class'], scoring="roc_auc_ovr", cv=StratifiedKFold(shuffle=True))
new dataset version: https://openml.org/search?type=data&status=active&id=45688&sort=runs
teachingAssistant has an ID attribute that should be ignored by default. This is particularly bad because there's an ordering in the data, so the ID is informative: In fact, ignoring everything but the ID gives near-perfect results: