Open gregparkes opened 4 months ago
Was pydantic_metadata.py entirely auto generated off the json schema or were there any manual tweaks that needed to be made?
Was pydantic_metadata.py entirely auto generated off the json schema or were there any manual tweaks that needed to be made?
Unfortunately a decent number of manual tweaks needed to be made - in particular the autogeneration tool turned every variable from e.g core:generator
in the schema into core_generator
as a variable name.
This:
+ Maintains uniqueness of each variable, allows extensions to have the same variable name as a core attribute. - Makes the variable names longer, which is annoying to write and read.
The tool also generated mostly base Python types (e.g int, str, float) for each attribute and did not supply any special typing e.g regex-compliant strings, positive integers (e.g core:sample_count) and so on.
The custom validation and serialization code associated to each object is also not generated - as a number of the rules are specified in the SigMF standard documentation found here but not actually implemented in the underlying JSON schema - for example sorting the captures and annotations array by core:sample_start, or ensuring core:freq_upper_edge > core:freq_lower_edge
. We solve this in Pydantic by ensuring these arrays are sorted in the validation process.
TL;DR - This PR is derived from issue #58 to automatically support data validation using Pydantic, a JSON and JSONschema-friendly validation library.
At this point, the PR only defines the schema and basic validations - I have not supplied any means to integrate it into the current library, so all existing behaviour with
SigMFFile
remains.Changes
A number of files within the
component
directory (renamed?), main one being thepydantic_metadata.py
script which contains a Pydantic definition from the JSONschema as specified on the main SigMF repository.The
pydantic_metadata.py
script defines the SigMF Metadata Standard which includes:SigMFGlobalInfo
- global_infoSigMFCapture
- a single SigMF captureSigMFAnnotation
- a single SigMF annotationSigMFMetaFileSchema
- a single metadata file (in .sigmf-meta format) containing global, list of captures and list of annotationsFeatures
To the best of my ability, these classes mirror the defined JSONschema standard and go above and beyond in many ways, including the following features:
core:datatype
, version and DOI strings utilise regex patterns to ensure compliance (seepydantic_types.py
).core:version
(GlobalInfo),core:uuid
(Annotation) andcore:datetime
(Capture) use default factories to fill automatically upon creation if not defined prior (auto-filling timestamps, version numbers etc)core:collection
,core:dataset
andcore:license
use Pathlib.Path and HttpUrl objects which supply extra functionality from Python core libraries when instantiated.core:sample_start
) check for non negative or positive integer.core:dataset
andcore:metadata_only
.core:sample_start
.How to use
Creating an object
I've added a helper method
SigMFMetaFileSchema.from_file()
which takes a .sigmf-meta file path and returns the Pydantic object for it.Using the object
All of the attributes are reachable by using their name, e.g
core:version
becomesobj.global_info.version
.Exporting an object
Once a
SigMFMetaFileSchema
object is created, it can be exported to dictionary.model_dump()
or JSON string (prior to storage in file, or over the network) using the.model_dump_json(by_alias=True, exclude_none=True)
method. Settingby_alias
andexclude_none
to True is important to ensure the core attributes all begin withcore:
etc.Accessing the schema
The JSON schema of the
SigMFMetaFileSchema
can be accessed using.model_json_schema()
, allowing you to integrate with any legacy code using the schema.Testing
I've supplied some unit tests in which seem to cover the basic cases, although a few extra real examples would be pretty handy, and I haven't properly checked (yet) how its outputs compare to the current outputs from
SigMFFile
.Current code coverage results (
pytest --cov=sigmf && coverage report
):My pipeline I've been using is a Python 3.7 environment in Anaconda:
black sigmf/component
ruff check sigmf/component --fix
pylint sigmf/component
(gets a 9.95 out of 10 score)mypy -m sigmf
raises no errors in my codeNext steps
At the moment there is no code for manipulating the Pydantic objects (aside from creation) to keep controller functionality separate from the 'data' component.
However supplying code to convert these objects into nested dictionaries / to file should be trivial.
Integration
Basically seeking some guidance and ideas as to how to integrate this into existing
sigmf-python
classes.I would suggest introducing this as an optional backend in the next version, with it becoming the default option at the next release version.
Something like adding a
backend=pydantic
parameter to thesigmf.sigmffile.fromfile
method or similar.Also happy for any changes to names / suggestions to file or internal objects.
SigMF Collections
I've began an implementation of the SigMF collection standard, but I'm less familiar with this object so need to play around with it some more.