mjacqu / glenglat

Global Englacial Temperature database
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

Add self-validation instructions for submissions #33

Closed ezwelty closed 1 year ago

ezwelty commented 1 year ago

Contributors could theoretically validate their submission against the submission metadata (contribute/datapackage.yaml) by installing frictionless locally.

Then validating against the remote descriptor:

frictionless validate --schema-sync --basepath path/to/csvs \
https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml

where path/to/csvs is the local directory containing the CSV files.

However, there are a couple issues with frictionless that prevent this from working:

I've reported the latter in https://github.com/frictionlessdata/framework/issues/1416.

ezwelty commented 1 year ago

https://github.com/frictionlessdata/framework/issues/1416 has been closed (as won't fix / works as intended) in favor of a new feature request: https://github.com/frictionlessdata/framework/issues/1435

So instead, contributors would need to do something like the following, which we could wrap into a command-line executable python script.

import frictionless

package = frictionless.Package('https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml')
for resource in package.resources:
  resource.basepath = 'local/path/to/csvs'
  # ...
ezwelty commented 1 year ago

Here is a quick command-line script based on Frictionless 5 (tested on v5.8.3) that does what we want:

validate.py

from pathlib import Path
import sys

import frictionless

PACKAGE_PATH = sys.argv[1]
DATA_BASEPATH = sys.argv[2]

package = frictionless.Package(PACKAGE_PATH)
detector = frictionless.Detector(schema_sync=True)
for resource in package.resources:
  resource.basepath = DATA_BASEPATH
  resource.path = Path(resource.path).name
  resource.detector = detector

report = frictionless.validate(package)
print(report.to_summary())

To validate glenglat from the root directory of the repo:

python validate.py datapackage.yaml data

To validate a submission from the root directory of the repo:

python validate.py contribute/datapackage.yaml path/to/csvs

To validate a submission remotely:

python validate.py https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml path/to/csvs