nspcc-dev / locode-db

Source of UN/LOCODE database generated by NeoFS CLI.
MIT License
2 stars 6 forks source link

Wrong number format in origin files #1

Open ZhangTao1596 opened 3 years ago

ZhangTao1596 commented 3 years ago

As we know, we expect specific integer and decimal parts in latitude or langitude numbers. As I port neofs c# and do unit tests, I find there are wrong formats in origin downloaded files.

Maybe we should correct these files and store them in this repo.

alexvanin commented 3 years ago

Unfortunately, these files will always have some inconsistency inside. Maintaining several large database files with fixes is hard and there may be some license issues. Instead we can maintain a short list of "overrided" UN/LOCODE records. These records can be applied to the database after parsing.

$ ./neofs-cli util locode generate \
  ...
  --override override.csv
  --out locode_db

If this option is okay, then we will add support of overrided values into locode generator in CLI as in example above.

For now I see that these records with invalid coordinates are simply ignored in v0.1.0 database

$ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "CA BHH"
Error: record not found
$ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "CA JSS"
Error: record not found
$ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "SA SAL"
Error: record not found

I think it is okay for now. But before N3 release (maybe for RC3) we will recompile it with newer database files and list of overrided values and publish it as v0.2.0

Thougths? @cthulhu-rider @realloc

alexvanin commented 3 years ago

NeoFS CLI LOCODE generator has --in flag to provide database files. We can provide any number of such files. In case of record collisions, data from latter file is being used. Therefore we can use --in flag with override.csv file as the last argument to achieve our goal.

$ cat override.csv 
,"SA","SAL","Salwá","Salwa","04","--3-----","RL","1707",,"2444N 05045E",

$ neofs-cli util locode generate \
  --airports airports.dat \
  --continents continents.geojson \
  --countries countries.dat \
  --subdiv 2020-2\ SubdivisionCodes.csv \
  --in 2020-2\ UNLOCODE\ CodeListPart1.csv \
  --in 2020-2\ UNLOCODE\ CodeListPart2.csv \
  --in 2020-2\ UNLOCODE\ CodeListPart3.csv \
  --in override.csv \ 
  --out locode_db

$ neofs-cli util locode info --db locode_db --locode "SA SAL"
Country: Saudi Arabia
Location: Salwa
Continent: Asia
Subdivision: [04] Ash Sharqiyah
Coordinates: 24.44, 50.45

I propose to create separate PR that adds override.csv file in this repository. There we can discuss content of override.csv file.