Open gregparkes opened 5 months ago
I think the 3.6 requirement was due to Ubuntu 18 coming with 3.6 and a lot of folks were still on 18 at the time, but a few years have passed and 18 EOLed last year anyway so we could consider bumping the min python to 3.7 or 3.8.
Pydantic is one of those things where it optimizes performance and if you are familiar with pydantic then the code is way simpler and more readable, but for those who are new to pydantic it will cause a higher level of effort to make tweaks and additions to the code, imo.
From an additional dependency perspective I personally don't see anything wrong with adding pydantic to the list (which now only includes numpy and jsonschema) although there will always be folks out there who like to keep it minimal for simplicity and security reasons.
Ultimately it's going to come down to how @Teque5 and @gmabey feel about Pydantic because they have been the main long-term maintainers of this code and even if you single-handedly refactored the entire code to use Pydantic and no regressions are introduced, if it makes it harder for them to maintain the project then it's a deal-breaker, especially since we don't need Pydantic to do what we want sigmf-python to do.
Thanks for getting back Marc - really love the PySDR book by the way - it really is one of the only great online resources out there for understanding SDRs, I/Q etc,
I think the 3.6 requirement was due to Ubuntu 18 coming with 3.6 and a lot of folks were still on 18 at the time, but a few years have passed and 18 EOLed last year anyway so we could consider bumping the min python to 3.7 or 3.8.
Glad to hear. Realistically it would be better if we bumped to v3.8 as I think with Python v3.7 it would use Pydantic V1 which had serious migration issues when moving to Pydantic V2.
Pydantic is one of those things where it optimizes performance and if you are familiar with pydantic then the code is way simpler and more readable, but for those who are new to pydantic it will cause a higher level of effort to make tweaks and additions to the code, imo.
Agreed, however I would argue it is good practice to follow model-view-controller design patterns, and Python dataclasses
(and Pydantic) are the state-of-the-art to manage hierarchical metadata. In order to maintain the usefulness and power of the SigMF standard, it will require using the state-of-the-art tools that Python provides.
The burden of implementation would only fall on those maintaining the library, as for usage it would be as simple as:
handle = sigmf.sigmffile.fromfile(my_file)
# metadata is a pydantic.BaseModel following the SigMFMeta schema.
index = handle.metadata.captures[0].sample_start
Within an IDE this is great as a user you're able to find the properties you want with hints / suggestions, rather than needing to know a priori using the dictionary constants currently implemented : e.g handle.get_global_field(SAMPLE_RATE_KEY)
, or worse, the schema standard off by heart handle.get_global_field("core:sample_rate")
.
From an additional dependency perspective I personally don't see anything wrong with adding pydantic to the list (which now only includes numpy and jsonschema) although there will always be folks out there who like to keep it minimal for simplicity and security reasons.
We could implement it for now in an optional way which avoids minimal changes to the current API. For instance, adding a backend=pydantic
parameter to the sigmf.sigmffile.fromfile()
method would be trivial. This could then expose a new property, i.e metadata
which holds the BaseModel object if pydantic
is used, else returns None or the full metadata dictionary as dict
.
In terms of security, there are a large number of companies which use Pydantic in their pipelines, so a security breach in Pydantic would affect considerably more than just SigMF users: https://docs.pydantic.dev/latest/why/#using-pydantic
Ultimately it's going to come down to how @Teque5 and @gmabey feel about Pydantic because they have been the main long-term maintainers of this code and even if you single-handedly refactored the entire code to use Pydantic and no regressions are introduced, if it makes it harder for them to maintain the project then it's a deal-breaker, especially since we don't need Pydantic to do what we want sigmf-python to do.
Agreed, I would anticipate that pydantic would significantly reduce the code footprint of the validation code, and Pydantic also works well with mypy
out of the box, whereas static type checking a nested dictionary is essentially painful or useless.
But I'll wait until we can agree either to skip or come up with a basic API.
The choice to implement validation through jsonschema
was due to lack of options at time of adoption. You've brought up some reasons why pydantic
is better which may all be true. If it's possible to implement a pydantic
implementation I have no issue but that will be a big PR. A perhaps insurmountable issue not mentioned here previously is that the specification in the main SigMF repository is now derived from the schema so we would need to (1) assure the website and PDF could still be dynamically generated from a different schema, and that (2) current use cases in this repository still work. It's possible I'm misunderstanding your proposal.
ps. We officially dropped support for 3.6
a while ago. Currently testing supports 3.7
through 3.12
, mostly to support those on isolated and/or slow-to-update distros like RHEL, Debian, and Scientific Linux. I can see making a newer version the new minimum but it needs strong justification.
OK great.
Issue (1): Shouldn't be a problem as we won't delete the schema.json file for now. PDF / website in future could be produced using a GitHub runner (loading in Pydantic etc) on commit to the main branch to deploy web pages to your chosen URL. Pydantic of course generates JSON schema on command using BaseModel.model_json_schema()
Issue (2): I'd have to check the specific use cases in question but Pydantic will handle automatically for you:
To address the primary concern of de-sync between the JSON schema and the Pydantic version, there is a hack - the datamodel-code-generator
PyPI library https://docs.pydantic.dev/2.5/integrations/datamodel_code_generator/ provides a CLI to input a jsonschema file and output a Python file using Pydantic, I've just ran it against the schema-meta.json
and it produces something looking reasonable.
Also good news, looks like Pydantic V2.5 works on Python 3.7+, so no need to bump the minimum version.
The base is there, just need to write some tests for it in a couple of days (when I get space between work).
@gmabey was wondering what your opinions are on pydantic
Aim
I would propose to overhaul the library as it currently stands to introduce a dependency on
pydantic
, an excellent data validation library that works hand-in-glove with JSON and JSON schemas.Let me know what you think of the below, and I'd consider developing a basic PR.
Note: I'm aware that Pydantic requires Python>=3.7 and there are mentioned comments that need backwards compatibility (specifically Python 3.6). If that is the case, the Pydantic parts could be implemented as optional if desired.
Benefits
This would serve to provide the following benefits:
.sigmf-meta
JSON files. Pydantic would perform type, length and semantic checking on your behalf.pydantic
is written in Rust).IQEngine
package you are also developing, as Pydantic works great with JSON for HTTP requests, FastAPI and other common web libraries.Examples
For example, in defining the schema for a SigMF Global object, one might:
and for a single meta file one would define:
calling
BaseModel.model_dump_json()
automatically converts nested BaseModel objects into a JSON-compliant strings which is trivial to store to disk. Similarly, reading in a.sigmf-meta
file could be parsed by aSigMFFile
object (which extends BaseModel) to validate the JSON according to the schema without any custom checking code.Pydantic can handle custom serialization formats by using aliasing, for example:
helps to handle any discrepancies between the JSON naming format e.g "core:- " or "my_custom_extension:<item"> and the Python variable name.
Extensions
Pydantic would be a great way to handle custom extensions for SigMF also, any custom user extension could simply extend a pre-defined extension class.