rte-france / challenge-roadef-2020

Apache License 2.0
19 stars 7 forks source link

whitespace in challenge input files causes unnecessarily large file size #30

Open mlangiu opened 3 years ago

mlangiu commented 3 years ago

I guess this comes a bit late, but I just realized that the format of the input files is somewhat problematic. The additional white space is nice for readability in small example files but of no great use in larger files which aren't read by humans anyways. The issue with the whitespace is that it causes unnecessarily large file sizes and consequentially also longer than necessary read-in times! Both file size and read-in times can be reduced significantly (70-80%) by using input files for which all whitespace is removed. The following script provides one way to accomplish this:

# compactify.py
import sys
import os
file_path = sys.argv[1]
path, filename = os.path.split(file_path)
name, extension = os.path.splitext(filename)
import json
with open(file_path, 'r') as f:
    data = json.load(f)
with open(os.path.join(path, name + '_compact' + extension), 'w') as f:
    json.dump(data, f, separators=(',', ':'))

e.g. with

for f in `ls A_set/*`;
do
  python compactify.py $f
done

I feel that it would be in everyone's interest, if you'd run the evaluation with the reduced-size input files.

Kind regards

Marco

klorel commented 3 years ago

Hi, This is a very nice suggestion, thanks!

For people using standard json parser there will be no problem. But we don't know if it is the case for every body, we'll see during the semi-finale evaluation.