cylinder-bands is leaking the target via the job_number column.
Similar to #57 I think this column should be ignored, unless this is intentional (which seems strange).
This dataset is part of the CC-18, I wonder if there's a way to fix this.
Maybe a better way to address this would be to use grouped cross-validation, but that would mean that downstream benchmarks are aware and use the provided splits.
hm the description of CC-18 says "classification tasks on dense data set independent observations", independent observations seems a bit of a stretch in this case.
cylinder-bands is leaking the target via the
job_number
column. Similar to #57 I think this column should be ignored, unless this is intentional (which seems strange). This dataset is part of the CC-18, I wonder if there's a way to fix this.Maybe a better way to address this would be to use grouped cross-validation, but that would mean that downstream benchmarks are aware and use the provided splits.