seedcase-project / seedcase-sprout

Upload your research data to formally structure it for better, more reliable, and easier research.
https://sprout.seedcase-project.org/
MIT License
0 stars 0 forks source link

Create dataclasses representing the top levels of the data package spec #625

Closed signekb closed 2 months ago

signekb commented 3 months ago

Create dataclasses to represent the top levels of the data package spec, including most* package and resource properties, but excluding resource > schema and everything under there.

* which ones we want to include is a work in progress

Dataclasses:

We want to be able to instantiate the classes with no arguments, so provide sensible default values.

Ideally, the classes should be generated automatically based on the data package spec.

martonvago commented 2 months ago

Looking at the json at https://datapackage.org/profiles/2.0/datapackage.json, this is a json schema, describing the structure of the datapackage.json file, right? But, if I understand correctly, our properties template file is more like a sample datapackage.json file (with some fields empty, with some fields set to a default value). So in order to transform the schema to this template file it’s not enough to drop the fields we don’t need because the structures are more different. E.g., in the template file we want a resources field on the top level (i.e. {"resources": […]}), but in the schema we have {“properties”: {“resources”: {“items”: {...}}}}.

Am I misunderstanding this ^^? If not, would it be easier to construct a template file by hand or does anyone have a better idea?

@seedcase-project/developers

signekb commented 2 months ago

Ah, I think you're right! It does need some rearranging to be used as a template. In my mind it would also be easier to construct a template file by hand instead of creating specific logic to rearrange the json at the url. Even though it would be nice if it could rely on the spec directly. What do you think, @lwjohnst86 ?

martonvago commented 2 months ago

Or another thought would be to see if there's a library that can generate a sample json object based on a json schema? And then we could just drop fields we don't need. I looked very quickly but no immediate clear hits.

martonvago commented 2 months ago

Well, it doesn't seem that hard to do, so I'm having a look at it today^^

lwjohnst86 commented 2 months ago

So the role of create_properties_template() function is to take the schema you mentioned @martonvago and to get it into the correct format to make a usable data package spec. So for instance, keeping only the stuff in properties, moving them up a level in the nesting (so those items are first level), dropping all the subnested propertyOrder etc items so that what we have left with is the exact spec needed for a folder being a data package. Editing the existing fields would be something like giving them a default value of nothing "".

Right now, the schema looks like:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Data Package",
  "description": "Data Package",
  "type": "object",
  "required": [
    "resources"
  ],
  "properties": {
    "$schema": {
      "default": "https://datapackage.org/profiles/1.0/datapackage.json",
      "propertyOrder": 10,
      "title": "Profile",
      "description": "The profile of this descriptor.",
      "type": "string"
    },
    "name": {
      "propertyOrder": 20,
      "title": "Name",
      "description": "An identifier string.",
      "type": "string",
      "context": "This is ideally a url-usable and human-readable name. Name `SHOULD` be invariant, meaning it `SHOULD NOT` change when its parent descriptor is updated.",
      "examples": [
        "{\n  \"name\": \"my-nice-name\"\n}\n"
      ]
    },
    ...

But it should look like:

{
  "$schema": "https://datapackage.org/profiles/1.0/datapackage.json",
  "name": "",
  ...