Closed ezwelty closed 1 year ago
https://github.com/frictionlessdata/framework/issues/1416 has been closed (as won't fix / works as intended) in favor of a new feature request: https://github.com/frictionlessdata/framework/issues/1435
So instead, contributors would need to do something like the following, which we could wrap into a command-line executable python script.
import frictionless
package = frictionless.Package('https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml')
for resource in package.resources:
resource.basepath = 'local/path/to/csvs'
# ...
Here is a quick command-line script based on Frictionless 5 (tested on v5.8.3) that does what we want:
validate.py
from pathlib import Path
import sys
import frictionless
PACKAGE_PATH = sys.argv[1]
DATA_BASEPATH = sys.argv[2]
package = frictionless.Package(PACKAGE_PATH)
detector = frictionless.Detector(schema_sync=True)
for resource in package.resources:
resource.basepath = DATA_BASEPATH
resource.path = Path(resource.path).name
resource.detector = detector
report = frictionless.validate(package)
print(report.to_summary())
To validate glenglat
from the root directory of the repo:
python validate.py datapackage.yaml data
To validate a submission from the root directory of the repo:
python validate.py contribute/datapackage.yaml path/to/csvs
To validate a submission remotely:
python validate.py https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/datapackage.yaml path/to/csvs
Contributors could theoretically validate their submission against the submission metadata (contribute/datapackage.yaml) by installing
frictionless
locally.into a new conda environment
or into an existing Python environment
Then validating against the remote descriptor:
where
path/to/csvs
is the local directory containing the CSV files.However, there are a couple issues with
frictionless
that prevent this from working:--schema-sync
: Both v4 and v5 ignore column order and missing optional columns (as desired), but v4 also ignores missing but required columns.--basepath
: v4 prepends this path to the resource paths (as desired; e.g.path/to/csvs/borehole.csv
), but v5 prepends this path to the descriptor path (e.g.https://raw.githubusercontent.com/mjacqu/glenglat/main/contribute/path/to/csvs/borehole.csv
).I've reported the latter in https://github.com/frictionlessdata/framework/issues/1416.