jqnatividad / qsv

CSVs sliced, diced & analyzed.
The Unlicense
2.34k stars 66 forks source link

`validate`: add support for custom JSONSchema keyword `dynenum` - allowing dynamic validation lookups against a CSV (remote, CKAN or local) #1890

Open jqnatividad opened 1 month ago

jqnatividad commented 1 month ago

Now that the jsonschema crate supports custom keywords, I'm considering adding a dynlookup keyword to allow validating a field against a CSV - with the CSV being either remote (http/s scheme supported) or in the local filesystem.

Though I'm sure we can have a full-blown luau script to do custom validation, it's not as convenient as JSONschema validation.

WDYT?

Originally posted by @jqnatividad in https://github.com/jqnatividad/qsv/discussions/1872#discussioncomment-9793354

jqnatividad commented 1 month ago

Call the custom keyword dynenum - a dynamic version of the JSONschema spec's enum https://json-schema.org/draft/2020-12/json-schema-validation#section-6.1.2

dynenum points to a CSV file. If no path is specified, it will be fetched from the same directory as the JSONSchema validation file.

It will be a one-column CSV, with optional display and description columns, i.e.:

value, display, description
manhattan, Manhattan, Manhattan County
queens, Queens, Queens County
brooklyn, Brooklyn, Kings County
staten_island, Staten Island, Staten Island County
bronx, Bronx, Bronx County

The names of the column do not matter, it will just treat the first column as the value column, the second column as the display column, and the third column as the description column.

The value column will be the one used for case-sensitive validation.

If the CSV file is remote, it will be downloaded and cached in the ~/.qsv-cache directory. The cached CSV enums are automatically updated using the http headers' ETag.

If the JSONSchema Validation file is also remote, the dynenum CSV files are assumed to be at the same URL as the JSONSchema file unless the CSV files have absolute URLs.

jqnatividad commented 1 month ago

In addition, if the CSV is on a CKAN site, have the ability to search the site for the latest version of a package, similar to luau's lookuptables function.