hubverse-org / hubValidations

Testing framework for hubverse hub validations
https://hubverse-org.github.io/hubValidations/
Other
1 stars 4 forks source link

Fix file name parse error (#132) #133

Closed annakrystalli closed 1 month ago

annakrystalli commented 1 month ago

This PR resolves #132 by bolstering the parsing of file names in parse_file_name() to ensure they match the specific pattern required rather than just checking patterns of interest are extracted successfully.

I also added more fine-grained check with more informative error messages and fixed a bug in which the compression extensions were not being successfully recorded and returned.

github-actions[bot] commented 1 month ago

🚀 Deployed on https://671a3934c3f2fbf0fb392631--hubvalidations-pr-previews.netlify.app

annakrystalli commented 1 month ago

Thanks for the reviews @nickreich and @zkamvar .

I'm going to merge but am wondering if we can put to bed the following issue as well https://github.com/hubverse-org/hubValidations/issues/81

Offically this PR allows for round IDs that are not strictly ISO dates, so long as they match the pattern [A-Za-z0-9_]. Overall the regex pattern for round IDs is: ^(\\d{4}-\\d{2}-\\d{2})|[A-Za-z0-9_]+$

Should I go ahead and open issues to:

zkamvar commented 1 month ago

I'm going to merge but am wondering if we can put to bed the following issue as well #81

I think closing that issue should require specific tests for this. At the moment, all tests assume ISO dates as round IDs?

annakrystalli commented 1 month ago

There is a test for round_1 round ID here: https://github.com/hubverse-org/hubValidations/blob/5c8395297181e2a2cd1f73661155511278d48438/tests/testthat/test-parse_file_name.R#L41-L50

zkamvar commented 1 month ago

There is a test for round_1 round ID here:

https://github.com/hubverse-org/hubValidations/blob/5c8395297181e2a2cd1f73661155511278d48438/tests/testthat/test-parse_file_name.R#L41-L50

Oh good point! Sorry about that. I'll close #81 as completed here.

nickreich commented 1 month ago

Regarding the suggestion to

Add a regex pattern to the tasks.json round_id property

I have forgotten where we are currently on this, but don't round IDs have to be YYYY-MM-DDs? I have a small worry about adding a complex regex expression as a field to a config file. Regex is a whole thing that is hard to understand for someone who hasn't interacted with them before. E.g. could we expect a hub admin to know what to put in here?

annakrystalli commented 1 month ago

Totally agreed that regex are complex and confusing. Hence I'm suggesting a clarification in hubDocs that makes the expectation clear in plain language that doesn't require interpretating regex.

Having said that we have always accepted round IDs that are not ISO dates which is exactly why I'm wanting to formalise this so there aren't further misconceptions and having to answer this question over and over again.

nickreich commented 1 month ago

Maybe we should migrate further discussion to a specific issue? I think basically I'm on board, and I like the idea of being more specific about round_id formatting. I guess as long as we provide several examples of commonly used formats for regex that hub admins could include in their config file it would be ok.

annakrystalli commented 1 month ago

Maybe we should migrate further discussion to a specific issue?

Can do but we've had multiple issues and discussions about this already:

and we've established that the pattern above is what is accepted (and in truth has always been accepted. It just hasn't been documented well enough hence the lingering confusion about this).

What I'm proposing is to open specific issues that codify what has effectively always been the case in the appropriate locations rather than open another discussion issue.

I guess as long as we provide several examples of commonly used formats for regex that hub admins could include in their config file it would be ok.

The hub admins would not need to provide a regex,. In the config it's the actual values that are encoded and the schema will perform the validation of the config, ensuring round id values match the expected pattern. Hence hub admins only need to ensure their proposed round IDs match the proposed pattern. Otherwise validate_config() will complain (once we actually formally codify it).

We do already provide valid examples of what round IDs should look like in the schema here: https://github.com/hubverse-org/schemas/blob/1b1b9b2e0810483c37faa386649e3083d5fcb05e/v3.0.1/tasks-schema.json#L28-L31

annakrystalli commented 1 month ago

Just to be crystal clear, hub admins just need to know that round IDs must be either:

We can verbally explain this in hubDocs while the pattern suggested for the schema and dynamic validation of round ID values in the round ID column happens behing the scenes