wmo-im / iwxxm-codelists

Code list management for WMO published content
0 stars 6 forks source link

Updated the TTL generation scripts and resulting files (second attempt) #18

Closed blchoy closed 2 years ago

blchoy commented 2 years ago

This PR involves committing TTL generation scripts and the resulting files for IWXXM 2021-2. Highlights of changes include:

  1. Revised the file and directory structures under /CSV to streamline the generation of TTL files
  2. Updated makeIWXXMEntities.py to align the output TTL files with existing IWXXM tables exported from codes.wmo.int and the WMDS GitHub repository
  3. Updated check_urls.py to make it run with TTL files under the /TTL subdirectory

The output file of check_urls.py running against prod is in here, and it is in agreement with the changes we have made to the tables.

amilan17 commented 2 years ago

@blchoy did you run the 'check_urls.py' against test or prod?

blchoy commented 2 years ago

@amilan17 the program was run against prod.

marqh commented 2 years ago

There are a very large number of changes of different forms on this pull request, which makes it very difficult to review carefully

there are no tests running with respect to this pull request

it seems to not include the contents of the 'master' branch, nor be targeting master

i would need to understand the difference between this branch and 'master' before recommending how to proceed. Is this branch derived from 'master'? it does not appear to be able to be merged automatically, according to github checks

indeed the directory structure of this branch is different from 'master'

it appears that the 'check_urls' process in master would not run without minor adaption, due to this updated directory structure

it is hard to gain confidence in differentiating the changes resulting from new entites from the changes due to internal processing, which are not intended to change publihsed entities. i think both of these types of change exist, but i can't tell them apart

amilan17 commented 2 years ago

@blchoy The ONLY thing that needs to change, right now, is that the TTL folders and files cannot be nested under a TTL directory. This is due to the fact that the upload script expects alignment between the location of the files on the repository and the hierarchy on the codes registry. Therefore, please output the folders under TTL as sisters of the CSV folder instead. 

like this:

I think Mark's concerns should be addressed, but they are not critical for the publication of the codes today.

amilan17 commented 2 years ago

@blchoy Actually, I don't think it's as simple as my proposal just above, but I think I have a way to move forward with publication as is. Will keep you posted.

amilan17 commented 2 years ago

I'm accepting this PR request as is, because I have enough of what I need to publish the codes, but it doesn't quite do it the right way and it will need more improvement.

These are the manual steps I took to publish to testwmocodes.metarelate.net

  1. check_url.py produces accurate output (I assume)
  2. move relevant directory out of TTL, so it's a sister of CSV directory
  3. run uploadChanges script
blchoy commented 2 years ago

I did create the files meeting the requirements in Anna's post, but the check_urls.py script gave even more errors than the previous ones. This is interesting as the only difference between the two sets of TTL files is the top level directory (i.e. /TTL).

Have a discussion with Mark just now and it seems to me that further refinements will have to be made to attain consistencies among the CSV files and the existing tables on the registry, especially the latter tables were created at different times in the past decade.