oxidecomputer / typify

compiler from JSON Schema into idiomatic Rust types
Apache License 2.0
430 stars 58 forks source link

patternProperties? #522

Open staehle opened 7 months ago

staehle commented 7 months ago

Hello:

Cool project! I started implementing it into a sort of repo-management-type software I'm working on. However, typify didn't quite do what I expected, and looking into it, doesn't seem to support JSON Schema's patternProperties. I'm using YAML, where for example, I have a file like this:

variables:
  BRANCH_GROUP_A: main
  BRANCH_GROUP_B: feature/beta
  SOMETHINGELSE: "123"
repos:
  thing:
    branch: hotfix/BUG-1234
    commit: HEAD
  some-code:
    branch: ${BRANCH_GROUP_A}
    commit: HEAD
  doesnt-matter:
    branch: ${BRANCH_GROUP_A}
    commit: 7f9329dc0741a4eeea209497f4e03513454e7606
  meta-star:
    branch: ${BRANCH_GROUP_B}
    commit: da9063bdfbe130f424ba487f167da68e0ce90e7d

I could argue for the repos section to convert that to a list instead of an object and add an extra "name: xyz" field, but that just adds more boilerplate, but variables definitely doesn't make much sense to do anything other than the current format.

So the JSON schema looks like this:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/snapshot.json",
    "title": "snapshot",
    "description": "Schema for SNAPSHOT",
    "type": "object",
    "properties": {
        "variables": {
            "type": "object",
            "patternProperties": {
                "^[A-Z_]+$": {
                    "$comment": "Variable names uppercase and underscores only, value any string",
                    "type": "string"
                }
            },
            "additionalProperties": false,
            "minProperties": 1,
            "uniqueItems": true
        },
        "repos": {
            "type": "object",
            "patternProperties": {
                "^[a-z_-]+$": {
                    "$comment": "Repo names lowercase only, object types",
                    "type": "object",
                    "properties": {
                        "branch": { "type": "string" },
                        "commit": { "type": "string" }
                    },
                    "required": ["branch", "commit"]
                }
            },
            "additionalProperties": false,
            "minProperties": 1,
            "uniqueItems": true
        }
    },
    "required": ["repos"],
    "additionalProperties": false
}

I integrated typify with build.rs, but get the same results with cargo typify ./snapshot.json, small example (which I believe is correct?):

let str_snapshot_content = std::fs::read_to_string("./test_snapshot.yaml").unwrap();
let test = serde_yaml::from_str::<snapshot::Snapshot>(&str_snapshot_content).unwrap();

And panicking at the second line, as I now expect, since it doesn't know what to look for:

thread 'main' panicked at src/main.rs:68:72:
called `Result::unwrap()` on an `Err` value: Error("variables: unknown field `BRANCH_GROUP_A`, there are no fields", line: 2, column: 3)

Also, various unused variable warnings in the generated code file:

warning: unused variable: `value`
   --> <snip>/out/gen_schema_types.rs:254:17
    |
254 |                 value: SnapshotRepos,
    |                 ^^^^^ help: if this is intentional, prefix it with an underscore: `_value`
    |
    = note: `#[warn(unused_variables)]` on by default

Anyway, I didn't see an existing issue for this, so figured I'd open one! Thanks!

ahl commented 7 months ago

You're right: patternProperties aren't consumed at all. Here are the generated types:

#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
pub struct Snapshot {
    pub repos: SnapshotRepos,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub variables: Option<SnapshotVariables>,
}

#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
pub struct SnapshotRepos {}

#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
pub struct SnapshotVariables {}

The Rust type you do want is a bit tricky: a HashMap or BTreeMap with keys of a particular string type (i.e. with the regex constraints you specified) and at least one value. (Note that your use of uniqueItems is, I think, spurious as that only applies to array type values). This is related to #456.

I think we might want something like this:

pub struct SnapshotRepos(BTreeMap<SnapshotReposKey, SnapshotReposValue>); // note not pub inner

impl<'de> Deserialize<'de> for SnapshotRepos {
    // custom deserializer that enforces minProperties = 1
}

impl TryFrom<BTreeMap<SnapshotReposKey, SnapshotReposValue>> for SnapshotRepos {
    // enforce minProperties = 1
}

// easy...
impl From<SnapshotRepos> for BTreeMap<SnapshotReposKey, SnapshotReposValue> {
    fn from(value: SnapshotRepos) -> Self {
        value.0
    }
}

pub struct SnapshotReposKey(String); // like other constrained string types

// pretty simple type
pub struct SnapshotReposValue {
    branch: String,
    commit: String
}

This isn't very hard to implement, but I'd like to make sure we have some reasonable approach for dealing with constrained types in a generic fashion. If someone is interested, please follow up and I'll do some thinking about what might make this consistent with the direction we want to take here.

staehle commented 7 months ago

Hey thanks for the quick response! That sounds entirely reasonable.

Quick side notes:

ahl commented 7 months ago

On the topic of HashMap vs BTreeMap: Not sure if you're aware, but as I was researching existing examples I did notice getsentry doing something like this to hot-replace the map type. Personally for my usage, I'm impartial as my data won't be large enough to be really affected by the choice, but perhaps some method of customization could be useful? Not sure how feasible that is.

Interesting! I had recently been checking out Sentry; was not aware that they were typify users. Maybe there should be some setting to use one or the other. I'm not fully sure why I might have used HashMap...

uniqueItems was definitely from some overzealous copy/pasting! JSON Schema does just ignore it if it's not with an array type though.

Right; not a big deal, just a heads up.