When splitting into 85% and 15% partitions, sometimes extra entry is added to 15% even though it should go in 85%

lunarlab-gatech / MorphoSymm-Replication

Fork of MorphoSymm that is used as a baseline to compare with our MI-HGNN.

https://lunarlab-gatech.github.io/Morphology-Informed-HGNN/

0 stars 0 forks source link

When splitting into 85% and 15% partitions, sometimes extra entry is added to 15% even though it should go in 85% #4

Closed DanielChaseButterfield closed 2 months ago

DanielChaseButterfield commented 2 months ago

For example, if a dataset had 103392 entries, the partitions should be 87756.55 (rounded to 87757) and 15486.45 (rounded to 15486). However, in this case, the code rounds the first partition down always by removing the decimals, resulting in 87756 and 15487 as the partitions.

Fixing this should be more accurate to the 85/15 split.

DanielChaseButterfield commented 2 months ago

Solved with commit dc75c540a1e24fe2e6d449092aed51c3147b79b4, and tested with the test_create_train_val_dataset() method from the tests/test_umich_contact_dataset.py file.