Closed LucieContamin closed 8 months ago
Hey @LucieContamin ! This was a great idea and I've just pushed the feature to this PR https://github.com/Infectious-Disease-Modeling-Hubs/hubUtils/pull/131
Now when printing a hub_connection
object, it reports the number of files per file format out of those available in the directory:
hub_path <- system.file("testhubs/simple", package = "hubUtils")
hubUtils::connect_hub(hub_path)
#>
#> ── <hub_connection/UnionDataset> ──
#>
#> • hub_name: "Simple Forecast Hub"
#> • hub_path:
#> '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/hubUtils/testhubs/simple'
#> • file_format: "csv(3/3)" and "parquet(1/1)"
#> • file_system: "LocalFileSystem"
#> • model_output_dir:
#> "/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/hubUtils/testhubs/simple/model-output"
#> • config_admin: 'hub-config/admin.json'
#> • config_tasks: 'hub-config/tasks.json'
#>
#> ── Connection schema
#> hub_connection
#> origin_date: date32[day]
#> target: string
#> horizon: int32
#> location: string
#> output_type: string
#> output_type_id: double
#> value: int32
#> model_id: string
#> age_group: string
Created on 2024-01-10 with reprex v2.0.2
It will also throw a warning if there are unopenned files and identify the files with problems. One thing I could use your input on for testing, is there a simple situation that has created this problem for you that I could re-create as a test case? i.e. what sort of problems did you find created missing individual files?
So currently if we are loading a hub, without issue we obtain:
Created on 2023-11-10 with reprex v2.0.2
However if we update one of the CSV incorrectly (for example did not update the
task.json
accordingly to adapt or have a column in an expected format, etc.)Created on 2023-11-10 with reprex v2.0.2
So as expected the number of CSV files in the second example is 8 instead of 9 as the "incorrect" one is not included. However, you load directly the data by doing:
or if we don't look at the hub connection object in details, it's easy to miss that one file was not included.
So I wonder if it would be helpful to print a message/warning when something like this happen?