parsiya / yaml-wrangling-with-rust

YAML Wrangling with Rust
MIT License
1 stars 0 forks source link

typify fails due to schema error... and unhandled conditions #2

Open ahl opened 1 year ago

ahl commented 1 year ago

Howdy; the initial failure you encountered with typify happens due to this construction:

  join-content:
    type: object
    title: "Join one or more rules together based on metavariable contents"
    properties:
      refs:
        type: array
        ...
      rules:
        type: array
        ...
      on:
        type: array
        ...
      additionalProperties: false
    additionalProperties: false

Note that there's a property of the join-content object called additionalProperties whose schema is false. This effectively says "there's an optional field called additionalProperties whose type has no valid value". Typify could render this as something like the never (!) type, but it doesn't.

After that, the schema contains many constructions like this:

{
  "type": "object",
  "oneOf": [
    {
      "required": [..]
    }, {
      "required": [..]
    }
  ]
}

And typify doesn't handle those currently (0.0.12).

It could, for example, ignore those constraints as the hand-written structs do here:

https://github.com/parsiya/yaml-wrangling-with-rust/blob/main/07-artisanal-structs/src/semgrep_rules.rs#L95-L109

// what's inside "taint-content:" in the schema.
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize)]
pub struct TaintContent {
    // Return finding where Semgrep pattern matches exactly
    pattern: Option<String>,

    #[serde(rename = "pattern-regex")]
    pattern_regex: Option<String>,

    patterns: Option<Vec<PatternsContent>>,

    #[serde(rename = "pattern-either")]
    pattern_either: Option<Vec<PatternEitherContent>>,
}

Typify could simply ignore the constraints as well, however the guiding principle for typify is to represent types as precisely as possible; in this case we'd like to emit something like:

pub enum TaintContent {
    Pattern(String),
    PatternRegex(String),
    Patterns(PatternsContent),
    PatternEither(PatternEitherContent)
}

Indeed, it's a bit strange that this wasn't modeled as a oneOf, but JSON schema invites all kinds of strange constructions... which makes code generation hard!

parsiya commented 1 year ago

Oh wow, I completely forgot about this issue because I was wrapping up some other work. Now, that I have more time, I am getting back into this project(s).

  1. Sorry for the late reply.
  2. Thank you for the very detailed explanation.

The original files are in ATD format and the JSON schema is generated from them. So, some of the issues might have been from the conversion.

Again, thanks. This was very enlightening.