sentier-dev / sentier_vocab

The data processing scripts and configuration to build the DdS vocabulary
MIT License
0 stars 1 forks source link

Add script to generate output files #11

Closed bbguimaraes closed 8 hours ago

bbguimaraes commented 16 hours ago

This adds a shell script which can be used both locally and on CI to download external files and process them and the input files in this repository to generate the Turtle files to be loaded onto the database.

Example execution ``` $ scripts/generate.sh == CN_2024.rdf.zip == % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 15.4M 100 15.4M 0 0 1192k 0 0:00:13 0:00:13 --:--:-- 3003k Archive: CN_2024.rdf.zip == combined_nomenclature == 2024-10-09 12:28:09.462 | INFO | __main__:CN2024:17 - Reading input RDF file /home/bbguimaraes/dds/src/cauldron/sentier_vocab/sentier_vocab/CN_2024.rdf 2024-10-09 12:29:08.649 | INFO | __main__:CN2024:20 - Changing labels to remove notation 2024-10-09 12:29:39.224 | INFO | sentier_vocab.open_energy_ontology:__init__:58 - Parsing and creating Open Energy Ontology elements 2024-10-09 12:29:39.631 | INFO | sentier_vocab.utils:streaming_download:70 - Downloading oeo-2.5.0.zip to /home/bbguimaraes/.local/share/sentier.dev/oeo-2.5.0.zip 2024-10-09 12:29:41.234 | INFO | __main__:CN2024:28 - Creating reciprocal relations 2024-10-09 12:29:42.822 | INFO | __main__:CN2024:34 - Writing output TTL file sentier_vocab/CN_2024.ttl == custom_products == 2024-10-09 12:30:27.756 | INFO | __main__::24 - Created custom graph at /home/bbguimaraes/dds/src/cauldron/sentier_vocab/sentier_vocab/data/custom-products.ttl == envo == :128: RuntimeWarning: 'sentier_vocab.envo' found in sys.modules after import of package 'sentier_vocab', but prior to execution of 'sentier_vocab.envo'; this may result in unpredictable behaviour == model_terms == 2024-10-09 12:30:33.778 | INFO | __main__:ModelTerms:11 - Reading input TTL file /home/bbguimaraes/dds/src/cauldron/sentier_vocab/sentier_vocab/data/model-terms.ttl 2024-10-09 12:30:33.788 | INFO | __main__:ModelTerms:13 - Creating reciprocal relations 2024-10-09 12:30:33.789 | INFO | __main__:ModelTerms:19 - Writing output TTL file /home/bbguimaraes/dds/src/cauldron/sentier_vocab/sentier_vocab/data/model-terms.reciprocal.ttl == nace == 2024-10-09 12:30:33.988 | INFO | __main__:CN2024:17 - Reading input RDF file /home/bbguimaraes/dds/src/cauldron/sentier_vocab/sentier_vocab/CN_2024.rdf 2024-10-09 12:31:37.455 | INFO | __main__:CN2024:20 - Changing labels to remove notation 2024-10-09 12:32:06.639 | INFO | sentier_vocab.open_energy_ontology:__init__:58 - Parsing and creating Open Energy Ontology elements 2024-10-09 12:32:07.808 | INFO | sentier_vocab.utils:streaming_download:70 - Downloading oeo-2.5.0.zip to /home/bbguimaraes/.local/share/sentier.dev/oeo-2.5.0.zip 2024-10-09 12:32:09.440 | INFO | __main__:CN2024:28 - Creating reciprocal relations 2024-10-09 12:32:11.052 | INFO | __main__:CN2024:34 - Writing output TTL file sentier_vocab/CN_2024.ttl == open_energy_ontology == 2024-10-09 12:32:55.795 | INFO | __main__:__init__:58 - Parsing and creating Open Energy Ontology elements 2024-10-09 12:32:56.211 | INFO | sentier_vocab.utils:streaming_download:70 - Downloading oeo-2.5.0.zip to /home/bbguimaraes/.local/share/sentier.dev/oeo-2.5.0.zip == qudt == :128: RuntimeWarning: 'sentier_vocab.qudt' found in sys.modules after import of package 'sentier_vocab', but prior to execution of 'sentier_vocab.qudt'; this may result in unpredictable behaviour 2024-10-09 12:32:58.947 | INFO | sentier_vocab.utils:streaming_download:70 - Downloading qudt-qudt-public-repo-v2.1.43-0-g4d44787.zip to /home/bbguimaraes/.local/share/sentier.dev/qudt-qudt-public-repo-v2.1.43-0-g4d44787.zip == supplements == $ git status --short ?? CN_2024.rdf ?? CN_2024.rdf.zip ?? cookies.txt ?? envo-sentier-dev.ttl ?? qudt.json ?? sentier_vocab/CN_2024.ttl ?? sentier_vocab/data/oeo-product-vocab.ttl ```

One thing I'm not sure about yet is some steps seem to have the same output file (e.g. combined_nomenclature and nace both say Writing output TTL file sentier_vocab/CN_2024.ttl). Is there a specific order of execution required?

bbguimaraes commented 16 hours ago

Part of https://github.com/sentier-dev/sentier_vocab/issues/9.

bbguimaraes commented 15 hours ago

Fixed a typo in the comments.

tngTUDOR commented 8 hours ago

We can merge this one, and move to the next step of #9 in another one