Closed annakrystalli closed 1 month ago
Thanks for the reviews @nickreich and @zkamvar .
I'm going to merge but am wondering if we can put to bed the following issue as well https://github.com/hubverse-org/hubValidations/issues/81
Offically this PR allows for round IDs that are not strictly ISO dates, so long as they match the pattern [A-Za-z0-9_]
. Overall the regex pattern for round IDs is:
^(\\d{4}-\\d{2}-\\d{2})|[A-Za-z0-9_]+$
Should I go ahead and open issues to:
tasks.json
round_id
propertyvaldiate_config()
that checks that round_id
s match that pattern (if tound_ids_from_variable
is true
hubDocs
in this section: https://hubverse.io/en/latest/user-guide/model-output.html#directory-structureI'm going to merge but am wondering if we can put to bed the following issue as well #81
I think closing that issue should require specific tests for this. At the moment, all tests assume ISO dates as round IDs?
There is a test for round_1
round ID here: https://github.com/hubverse-org/hubValidations/blob/5c8395297181e2a2cd1f73661155511278d48438/tests/testthat/test-parse_file_name.R#L41-L50
There is a test for
round_1
round ID here:
Oh good point! Sorry about that. I'll close #81 as completed here.
Regarding the suggestion to
Add a regex pattern to the
tasks.json
round_id
property
I have forgotten where we are currently on this, but don't round IDs have to be YYYY-MM-DD
s? I have a small worry about adding a complex regex expression as a field to a config file. Regex is a whole thing that is hard to understand for someone who hasn't interacted with them before. E.g. could we expect a hub admin to know what to put in here?
Totally agreed that regex are complex and confusing. Hence I'm suggesting a clarification in hubDocs that makes the expectation clear in plain language that doesn't require interpretating regex.
Having said that we have always accepted round IDs that are not ISO dates which is exactly why I'm wanting to formalise this so there aren't further misconceptions and having to answer this question over and over again.
Maybe we should migrate further discussion to a specific issue? I think basically I'm on board, and I like the idea of being more specific about round_id formatting. I guess as long as we provide several examples of commonly used formats for regex that hub admins could include in their config file it would be ok.
Maybe we should migrate further discussion to a specific issue?
Can do but we've had multiple issues and discussions about this already:
and we've established that the pattern above is what is accepted (and in truth has always been accepted. It just hasn't been documented well enough hence the lingering confusion about this).
What I'm proposing is to open specific issues that codify what has effectively always been the case in the appropriate locations rather than open another discussion issue.
I guess as long as we provide several examples of commonly used formats for regex that hub admins could include in their config file it would be ok.
The hub admins would not need to provide a regex,. In the config it's the actual values that are encoded and the schema will perform the validation of the config, ensuring round id values match the expected pattern. Hence hub admins only need to ensure their proposed round IDs match the proposed pattern. Otherwise validate_config()
will complain (once we actually formally codify it).
We do already provide valid examples of what round IDs should look like in the schema here: https://github.com/hubverse-org/schemas/blob/1b1b9b2e0810483c37faa386649e3083d5fcb05e/v3.0.1/tasks-schema.json#L28-L31
hubDocs
documentation also,Just to be crystal clear, hub admins just need to know that round IDs must be either:
YYYY-MM-DD
)_
We can verbally explain this in hubDocs
while the pattern suggested for the schema and dynamic validation of round ID values in the round ID column happens behing the scenes
This PR resolves #132 by bolstering the parsing of file names in
parse_file_name()
to ensure they match the specific pattern required rather than just checking patterns of interest are extracted successfully.I also added more fine-grained check with more informative error messages and fixed a bug in which the compression extensions were not being successfully recorded and returned.