nspcc-dev / locode-db

Source of UN/LOCODE database generated by NeoFS CLI.
MIT License
3 stars 6 forks source link

locode-db: store locodedb as embed csv.gz files #22

Closed AliceInHunterland closed 10 months ago

AliceInHunterland commented 10 months ago

Getting records from gzipped files using embed. Removed redundant csv with continents. Removed duplicates form tables. Generated csv.gz.

Refs #17.

AliceInHunterland commented 10 months ago

@carpawell as for

why do countries.csv.gz and locodes.csv.gz change when i call make?

we use make for regeneration of locodes.csv and countries.csv, so countries.csv.gz and locodes.csv.gz are overwritten every time.

carpawell commented 10 months ago

we use make for regeneration of locodes.csv and countries.csv, so countries.csv.gz and locodes.csv.gz are overwritten every time.

yes, but why do they differ? do they differ every time we generate a "db"? even if "in" is the same?

AliceInHunterland commented 10 months ago

yes, but why do they differ? do they differ every time we generate a "db"? even if "in" is the same?

files csv.gz will be different every time because of the difference in files metadata. The content of the files is the same if the inputs are the same. gzip command, which we use in Makefile, with -f flag overwrites existing files with new metadata. So we can add checking content before writing csv in the generate command. but it seems kind of unpredictable for users that the output directory (which is an argument of command) should not contain csv.gz files with identical content of csv files.

roman-khimov commented 10 months ago

Have you tried --no-name to avoid timestamps in resulting files? I think it should be possible to have the same result for the same input. And we don't care about names/timestamps for our purpose as well.

carpawell commented 10 months ago

We add the linter, we can fix the linter!