Bug in dataset splits? - Githubissues

ml-jku / hopular

Hopular: Modern Hopfield Networks for Tabular Data

https://ml-jku.github.io/hopular/

MIT License

305 stars 26 forks source link

Bug in dataset splits? #6

Open puhsu opened 1 year ago

puhsu commented 1 year ago

I'm reading the dataset code and there is probably a bug here:

https://github.com/ml-jku/hopular/blob/3e0c39fdc59568349373af573ee52c03305ca105/hopular/auxiliary/data.py#L597-L602

After that, you are using the old indices to index into the shuffled/concatenated arrays. So the splits are different (not stratified, for example):

https://github.com/ml-jku/hopular/blob/3e0c39fdc59568349373af573ee52c03305ca105/hopular/auxiliary/data.py#L637-L638