iiasa / pysquirrel

MIT License
6 stars 1 forks source link

Add weekly hash check to NUTS file and XLSX to YAML utility function #40

Open dc-almeida opened 4 days ago

dc-almeida commented 4 days ago

Closes #38 Adds a weekly GH action to compare the hash of a local NUTS file with the EUROSTAT website version Adds a XLSX to YAML utility function for use when updating the local file to latest version

danielhuppmann commented 4 days ago

First question: is the performance difference between an xlsx spreadsheet versus the yaml files noticeable?

Second question: if Eurostat again changes the file (hosted on the same url), we will only be notified that the hashes don't match, without any guidance on the actual change... Having the xlsx-to-yaml utility can be a useful way to find the change and/or check whether the relevant for us.

phackstock commented 4 days ago

First question: is the performance difference between an xlsx spreadsheet versus the yaml files noticeable?

Technically, yaml should be faster than xlsx. However, in practice, reading data is orders of magnitude slower so the difference does not matter.

Second question: if Eurostat again changes the file (hosted on the same url), we will only be notified that the hashes don't match, without any guidance on the actual change... Having the xlsx-to-yaml utility can be a useful way to find the change and/or check whether the relevant for us.

Ok, if Eurostat does not provide a changelog then I can see the value of comparing yaml files. We should still be careful though that the Excel file and the yaml files match. Otherwise we might get a wrong warning.

dc-almeida commented 3 days ago

Ok, if Eurostat does not provide a changelog then I can see the value of comparing yaml files. We should still be careful though that the Excel file and the yaml files match. Otherwise we might get a wrong warning.

YAML files are also comparable in GitHub changelogs, whereas XLSX files are not, so it allows tracking changes to the regions.