GSoC: Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects

jviotti commented 9 months ago

Project title

Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects.

Brief Description

The Alterschema project defines a set of JSON-based formal transformation rules for upgrading schemas between Draft 4 and 2020-12, and all dialects in between. These rules are defined using JSON Schema and JSON-e and live within the Alterschema project.

We would like to revise these rules, extend them to support every dialect of JSON Schema (potentially including OpenAPI's old dialects too), and attempt to support some level of downgrading.

Instead of having these rules on the Alterschema repository, we want to have them on the JSON Schema organization for everybody to consume, including Alterschema itself.

Revising the rule format should consider currently unresolved edge cases in Alterschema like tweaking references after a subschema is moved.

Expected Outcomes

A new repository in the JSON Schema organization with upgrade/downgrade rules defined using JSON.

Skills Required

Understanding of various dialects of JSON Schema and their differences.

Mentors

@jviotti

Expected Difficulty

Medium

Expected Time Commitment

350 hours

benjagm commented 9 months ago

Thanks Juan. This looks amazing!

suprith-hub commented 9 months ago

Hey @jviotti I read through the problem statement, I loved the way the description was put through giving a good understanding. I would love to work on this problem statement under GSOC and the mentors. Can you guide me through more understanding regarding this..😁 and where to start with And will it be good to read all of the repositories

jviotti commented 9 months ago

Hey there! I'd first suggest getting acquainted with https://github.com/sourcemeta/alterschema. This is the original project where I prototyped something like what we want to do here, using JSON-e (https://json-e.js.org), but ended up hitting some blockers. You can take a look at all the upgrade transformation rules I support here: https://github.com/sourcemeta/alterschema/tree/master/rules. Try to read them, and understand them mainly in conjunction with JSON Schema's official migration guide: https://json-schema.org/specification#migrating-from-older-drafts.

The way Alterschema work is pretty simple. It will recursively traverse through every subschema of the given schema in a top-down manner, applying all the rules it knows about to every subschema over and over again until no more transformation rules can be executed. The core business logic of it its literally a small JavaScript file: https://github.com/sourcemeta/alterschema/blob/master/bindings/node/index.js

For example, Alterschema rules for upgrading JSON Schema 2019-09 to 2020-12 are defined here: https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json, based on what JSON Schema published here: https://json-schema.org/draft/2020-12/release-notes.

Now, what we would like to do in this GSoC initiative is learn from what we did in Alterschema to do another take on the problem that solves the limitations of Alterschema. The main limitation is this one: https://github.com/sourcemeta/alterschema/issues/43.

In summary, a JSON Schema may reference other parts of itself using URI encoded JSON Pointers along with the $ref and $dynamicRef keywords. The current JSON-e rules that I have on Alterschema will only look at the current subschema and blindly transform it according to what the template says.

However, what happens if there is a reference in another other part the schema that is now invalid after the schema transformation you did somewhere else? If so, we don't have a deterministic way of detecting this, even less know how to "fix up" the reference pointers.

The conclusion I got from this is that JSON-e, while powerful, is too low level and doesn't carry semantics about what the transformation actually did. For example, if you upgrade definitions to $defs, that's a simple rename. Knowing that it is indeed just a simple rename, it's easy to know how to fix any pointers that included /definitions in it.

So what I'm thinking about is that we can study the transformation rules that we want to perform, and break them down into higher level sub transformations. For example, are you completely deleting something? Are we performing just a rename? Are we moving the contents around? If we design a JSON language that works at a higher level of abstraction, we can deterministically know how we should fix any affected pointer.

jviotti commented 9 months ago

So I'd say the phases in this project are like this:

Research JSON Schema transformation rules, categorize them, etc
Come up with a higher-level transformation language than JSON-e that carry semantics about how we are actually transforming the schema (I was thinking something similar to JSON Patch (https://jsonpatch.com))
Then do a prototype of implementing upgrade rules with this language, ensuring it solves the limitations of Alterschema
If we have more time, we use this language to attempt to level of downgrading support, etc

jviotti commented 9 months ago

As an initial qualifying task for this project (cc @benjagm), I propose:

Go through every upgrade transformation rules from JSON Schema 2019-09 to 2020-12 in the official upgrade guide (https://json-schema.org/draft/2020-12/release-notes) and on Alterschema (https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json) and categorize them on a spreadsheet/table based on what they are doing. For example, are they simple renames, are they completely moving stuff around? Are they doing something even more complicated? Up to you to figure out how to categorize them
Propose a toy JSON-based DSL transformation language (perhaps inspired by JSON-e and JSON Patch) that encapsulates how to perform these 2019-09 to 2020-12 upgrade rules in a way that you can algorithmically tell how to fix any $ref JSON Pointer that went through the transformed schema
Describe a pseudo-algorithm to fix up $refs

jviotti commented 9 months ago

As a more specific (though probably a bit artificial and silly 😅) example of the $ref issue, consider the following JSON Schema 2019-09:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "array",
  "items": [
    { "type": "string" },
    { "type": "number" }
  ],
  "additionalItems": { 
    "$ref": "#/items/0" 
  }
}

To turn it into a JSON Schema 2020-12, we need to:

Replace $schema with https://json-schema.org/draft/2020-12/schema
Rename /items to /prefixItems
Rename /additionalItems to /items

However, if you blindly perform these transformations, you would end up with the following schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "number" }
  ],
  "items": { 
    "$ref": "#/items/0" 
  }
}

However note that the /items/$ref, which still says #/items/0 is now invalid. We first renamed prefixItems to items, so the $ref should have been updated to #/prefixItems/0 too.

This one is a bit simple, but think about more complex variations of the same problem. You might have long references where many of its components will need to be updated, and in some cases, it will be more than just a component rename.

jviotti commented 9 months ago

Or if you can think of a better way to deterministically solve this problem, please propose it and we can work on it together!

MeastroZI commented 9 months ago

However note that the /items/$ref, which still says #/items/0 is now invalid. We first renamed prefixItems to items, so the $ref should have been updated to #/prefixItems/0 too.

I'm confused by this line. Are we supposed to convert prefixItems to items for the reference to be #/prefixItems/0 as part of the conversion from 2019-09 to 2020-12?

Perhaps you meant items to prefixItems, or maybe I am misunderstanding? :confused:

jviotti commented 9 months ago

@MeastroZI The reference was originally #/items/0, but because we rename items to prefixItems, for the schema to be valid, we should have also adjusted the reference from #/items/0 to #/prefixItems/0. The expected end result should have been this:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "number" }
  ],
  "items": { 
    "$ref": "#/prefixItems/0" 
  }
}

MeastroZI commented 9 months ago

Hasn't this problem already been addressed with the pattern

"pattern": "/items/\\d+"

"$eval": "replace(schema['$ref'], '/items/(\\d+)', '/prefixItems/$1')"

or is there a possibility that this approach might not cover all cases? If so, could you please specify which cases it might not handle, so I can gain a better understanding of the issue?

jviotti commented 9 months ago

@MeastroZI For this very trivial rename case yes, but it's very easy to construct valid JSON Schemas where that simple pattern won't do. Take this one as a silly example:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "items": {
      "items": [
        { "type": "string" }
      ]
    },
    "extra": {
      "$ref": "#/properties/items/items/0" 
    }
  }
}

It has an object property called items which is not the actual JSON Schema keyword. In this case, you need to rename only /properties/items/items to /properties/items/prefixItems, and thus only rename the second occurrence of items in the JSON Pointer. In JSON Schema 2019-09, items can also be both a schema or a collection of schemas, so you can have items be a schema that declares items as an array inside and get into a similar situation. You can probably come up with more edge cases around it.

In any case, items to prefixItems is just a simple rename upgrade example. Other JSON Schema keywords may require more than just a simple renaming, making this even harder to resolve for all cases.

Keep in mind that a tool that upgrades schemas must be able to handle ANY valid JSON Schema document that the user passes to it, and handle these tricky edge cases accordingly.

jviotti commented 9 months ago

For i.e. definitions to $defs in the Alterschema issue I shared is even trickier, because you cannot rely on the next component of items being an integer to improve the pattern like we do for items to prefixItems.

jviotti commented 9 months ago

Here is a fun one that is valid and breaks the \\d part of the regex:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "foo": {
      "$ref": "#/$defs/items/0" 
    }
  },
  "$defs": {
    "items": {
      "0": {
        "type": "string"
      }
    }
  }
}

jviotti commented 9 months ago

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case

suprith-hub commented 9 months ago

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case Hi, so instead of handling for every single case for keywords to be transformed.., it is better to make checks based on the semantic hierarchial flow. Am I right? Like chacking whether its an array or object if its only a real item and then casting the 0 to string? Is that what semantics means

suprith-hub commented 9 months ago

Okay ill complete this rn

On Sat, 24 Feb 2024 at 1:59 AM, Juan Cruz Viotti @.***> wrote:

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case

— Reply to this email directly, view it on GitHub https://github.com/json-schema-org/community/issues/599#issuecomment-1961949816, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASS4PJ5QFZKIGUM3HXQQUOLYVD33ZAVCNFSM6AAAAABCRLXYHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHE2DSOBRGY . You are receiving this because you commented.Message ID: @.***>

jviotti commented 9 months ago

Hi, so instead of handling for every single case for keywords to be transformed.., it is better to make checks based on the semantic hierarchial flow. Am I right? Like chacking whether it's an array or object if it's only a real item and then casting the 0 to string? Is that what semantics means

Not 100% sure what you mean, but what I mean by semantics is being able to statically analyze the actual transformation DSL and actually understand what it does. For example, you cannot very easily tell from a JSON-e template that such template is actually a property rename. And if we can tell that i.e. a rule is actually a rename for A to B, then we might know how to handle the reference fix ups.

Coming back to the items to prefixItems example we've been discussing so far, this is the corresponding JSON-e rule we have in Alterschema:

{
  "$merge": [
    { "$eval": "omit(schema, 'items')" },
    {
      "prefixItems": {
        "$eval": "schema.items"
      }
    }
  ]
}

What if instead of that weird-looking low-level complex JSON template, we instead had:

[
  { "type": "rename", "from": "items", "to": "prefixItems" }
]

The latter is a LOT more machine readable.

I guess the main challenge is that leaving the simple prefixItems rule aside, some upgrade rules are more complex and involve even more cryptic JSON-e templates that do more than just renames. So the problem statement is: can we come up with a set of higher level operations that capture everything we need, AND that is machine readable enough for us to deterministically do $ref fix-ups in every possible case?

suprith-hub commented 9 months ago

So I'd say the phases in this project are like this:

Research JSON Schema transformation rules, categorize them, etc

Come up with a higher-level transformation language than JSON-e that carry semantics about how we are actually transforming the schema (I was thinking something similar to JSON Patch (https://jsonpatch.com))

Then do a prototype of implementing upgrade rules with this language, ensuring it solves the limitations of Alterschema

If we have more time, we use this language to attempt to level of downgrading support, etc

@jviotti one question in this: Should the high level transformation language call the JSON-e at the backend or can say(should the high level one be written on top of JSON-e itself)?

jviotti commented 9 months ago

@Era-cell Maybe. I'm open to both building it on top of JSON-e or as a standalone thing. Whatever is easier I guess

benjagm commented 8 months ago

Thanks a lot for joining JSON Schema org for this edition of GSoC!!

Qualification tasks will be published as comments in the project ideas by Thursday/Friday of this week. In addition I'd like to invite you to a office hours session this thursday 18:30 UTC where we'll present the ideas and the relevant date to consider at this stage of the program.

Please use this link to join the session: 🌐 Zoom 📅 20124-02-29 18:30 UTC

See you there!

jviotti commented 8 months ago

For the qualifying task, just to echo back what I said before: the main thing we want to see on proposals is that you have a good grasp on what the problem of upgrading JSON Schemas is and are capable of understanding the upgrade rules that would need to be implemented.

So for that, you can focus only on 2019-09 to 2020-12 for the proposal (we'll cover other drafts later), list down the transformation rules that need to happen on all those drafts, and try to categorize them based on different criteria to understand them better. For example, what vocabulary they involve, what type of operation they are (rename, wrap, etc), whether they affect other sibling or non sibling keywords, etc. Be creative! Good grouping criteria can surface patterns that we might not be thinking about and that could influence the DSL. You can present this as a spreadsheet, list, or any form you want.

Then, once accepted, we will continue building up on this analysis to design the DSL, and finally implement it. If we did the previous phases well (mainly the one one understanding and categorizing the transformation rules), the rest will be easy

MeastroZI commented 8 months ago

{
  "$schema": "https://json-schema.org/draft/2020-12",
  "$id": "https://example.com/anotherthing/agains/customer",

  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "phone": { "$ref": "/schema/common#/$defs/phone" },
    "address": { "$ref": "/schema/address" }
  },

  "$defs": {
    "https://example.com/schema/address": {
      "$id": "https://example.com/schema/address",

      "type": "object",
      "properties": {
        "address": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
        "state": { "$ref": "#/$defs/states" }
      },

      "$defs": {
        "states": {
          "enum": [4, 4]
        }
      }
    },
    "https://example.com/schema/common": {
      "$schema": "https://json-schema.org/draft/2019-09",
      "$id": "https://example.com/schema/common",

      "$defs": {
        "phone": {
          "type": "number"
        },
        "usaPostalCode": {
          "type": "string",
          "pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
        },
        "unsignedInt": {
          "type": "integer",
          "minimum": 0
        }
      }
    }
  }
}

@jviotti I am not able to understand how, in this case, this $ref under:

"phone": { "$ref": "/schema/common#/$defs/phone" }

which has the relative path, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root?

suprith-hub commented 8 months ago

```json
{
  "$schema": "https://json-schema.org/draft/2020-12",
  "$id": "https://example.com/anotherthing/agains/customer",

  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "phone": { "$ref": "/schema/common#/$defs/phone" },
    "address": { "$ref": "/schema/address" }
  },

  "$defs": {
    "https://example.com/schema/address": {
      "$id": "https://example.com/schema/address",

      "type": "object",
      "properties": {
        "address": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
        "state": { "$ref": "#/$defs/states" }
      },

      "$defs": {
        "states": {
          "enum": [4, 4]
        }
      }
    },
    "https://example.com/schema/common": {
      "$schema": "https://json-schema.org/draft/2019-09",
      "$id": "https://example.com/schema/common",

      "$defs": {
        "phone": {
          "type": "number"
        },
        "usaPostalCode": {
          "type": "string",
          "pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
        },
        "unsignedInt": {
          "type": "integer",
          "minimum": 0
        }
      }
    }
  }
}

@jviotti I am not able to understand how, in this case, this $ref under:

"phone": { "$ref": "/schema/common#/$defs/phone" }

which has the relative part, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root?

Did you try to run it? I am thinking this is related to how schemas are stored

MeastroZI commented 8 months ago

@Era-cell, I have read somewhere that $ref is resolved by directly pointing to the schema part they are referring to. So now my question is: how does the schema validator resolve this $ref with a relative path? Even if the schema validator stores these schemas in the definition part or in some other way under the hood , there is still a need to resolve it by referencing it and resolving $ref.

suprith-hub commented 8 months ago

As per documentation: refs are encapsulated from parent schema but defs aren't so annotation results of external achema should effect only validation results. If sub-schema with $ref fails schema is invalidated

On Thu, 29 Feb 2024 at 1:39 PM, Vinit Pandit @.***> wrote:

@Era-cell https://github.com/Era-cell, I have read somewhere that $ref is resolved by directly pointing to the schema part they are referring to. So now my question is: how does the schema validator resolve this $ref with a relative path? Even if the schema validator stores these schemas in the definition part, there is still a need to resolve it by referencing it and resolving $ref.

— Reply to this email directly, view it on GitHub https://github.com/json-schema-org/community/issues/599#issuecomment-1970616573, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASS4PJZAM37KOLAJAE3HU73YV3Q3BAVCNFSM6AAAAABCRLXYHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQGYYTMNJXGM . You are receiving this because you were mentioned.Message ID: @.***>

MeastroZI commented 8 months ago

The schema I provided is not invalidating; it's working and successfully validating the JSON data.

You can try it here: https://www.jsonschemavalidator.net/

Edited: Sorry, I am typing from my phone, so may you face typos in my messages

jviotti commented 8 months ago

@MeastroZI Your reference, /schema/common#/$defs/phone is a URI reference, where /schema/common is the URI path and #/$defs/phone is the URI fragment. Furthermore, that URI reference is relative.

According to JSON Schema use of URI and the URI RFC, that relative URI is resolved taking https://example.com/anotherthing/agains/customer (the $id of the schema resource that contains such reference), as the base URI.

Following standard URI behavior, the result of resolving /schema/common#/$defs/phone against https://example.com/anotherthing/agains/customer results in https://example.com/schema/common#/$defs/phone. Then, when resolving that reference, JSON Schema will look for https://example.com/schema/common, which is an embedded schema resource in the schema you shared, and from then, resolve #/$defs/phone as a JSON Pointer.

If URI behavior is the confusing part, I recommend reading the URI RFC: https://www.rfc-editor.org/rfc/rfc3986

MeastroZI commented 8 months ago


const transformRule = [
    {
    referencTraverser: true,
    path: "properties/*",
    conditions: [{ "isKey": "$ref" }],
    refConditions: [{ "isKey": "items", "hasSibling": ["type", "array"] }],
    updateRefPart: "prefixItems"
},
{
    path: '*',
    conditions: [{ "isKey": "items", "hasSibling": ["type", "array"] }],
    operations: {
        "editKey": "prefixItems"
    }
} , 
{
    path : '$schema' ,

    operations : {
        "updateValue" : "https://json-schema.org/draft/2020-12/schema"
    }
}
]

const jasonobj = {
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "type": "object",
    "properties": {
        "items": {
            "type": "array",
            "items": [
                { "type": "string" }
            ]
        },
        "extra": {
            "$ref": "#/properties/items/items/0"
        }
    },
    "ooos": {
        "items2": {
            "type": "array",
            "items": []
        },
        "item3": {
            "items4": {
                "items5": {
                    "type": "array",
                    "items": []
                }
            }
        }
    }
}

const result = convert(transformRule, jasonobj)
console.log('\n')
console.log('*******************************Logs*****************************************')
console.log('\n\n\n\n\n\n')
console.log('*******************************Result****************************************')
console.log( JSON.stringify (result , null , 2))
console.log('*******************************Result****************************************')
console.log('\n')

and here is the output

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "prefixItems": [
        {
          "type": "string"
        }
      ]
    },
    "extra": {
      "$ref": "#/properties/items/prefixItems/0"
    }
  },
  "ooos": {
    "items2": {
      "type": "array",
      "prefixItems": []
    },
    "item3": {
      "items4": {
        "items5": {
          "type": "array",
          "prefixItems": []
        }
      }
    }
  }
}

Hi @jviotti, I have a doubt about the meaning of the JSON DSL. Could you please take a look at this code? It's a snippet of my work towards DSL. Actually, I want to know if my code can do something like this. Is it considered as a DSL? If not, how would you technically define a DSL?

And sorry for the previous comment. One more thing I am hesitant about is asking this many questions. Is it okay to ask this many questions or are they silly? I want to openly express my concern about it.

jviotti commented 8 months ago

@MeastroZI

I have a doubt about the meaning of the JSON DSL. Could you please take a look at this code? It's a snippet of my work towards DSL. Actually, I want to know if my code can do something like this. Is it considered as a DSL? If not, how would you technically define a DSL?

Yeah, exactly, you are thinking about it in the right direction. Your transformRule JSON example is definitely a valid DSL.

And sorry for the previous comment. One more thing I am hesitant about is asking this many questions. Is it okay to ask this many questions or are they silly? I want to openly express my concern about it.

Please ask as many questions as you need. That's the whole point of this phase and I'm sure other people reading this thread would benefit as well. Asking lots of questions is definitely better than not asking them.

MeastroZI commented 8 months ago

@jviotti can you explain this


{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [{ "type": "string" }, { "type": "string" }],
  "not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }
  },
  "unevaluatedItems": false
}

specially this part


"not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }

My understanding is that it dictates that there must not be any items in the array that are strings with a length less than 3. Therefore, the schema should only accept arrays where all elements have a minimum length of 3. However, it seems to also accept arrays like ["axd", "d"]. Could you clarify this?"

suprith-hub commented 8 months ago

Also the unevaluatedItems behaviour is a bit wierd:

registerSchema({
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_move",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "anyOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^y"
    }
})

for the instance: ["aaa", "ya"] Shouldn't "^y" go to unevaluatedItems and produce true, why does it give false over here. In both the examples, the presence of items keyword is making it confusing

MeastroZI commented 8 months ago

@Era-cell unevaluatedItems is only apply to the element in array which is not evaluated but as you use the items before the unevaluatedItems this make all the array element succefully evaluated so there will be no element will left unevaluated thats why your instance is not validating in this , you have to apply the keyword logically there is no meaning to put the unevaluatedItems property if all element is getting evaluated . 🙂 If i am wrong please correct me

suprith-hub commented 8 months ago

@Era-cell unevaluatedItems is only apply to the element in array which is not evaluated but as you use the items before the unevaluatedItems this make all the array element succefully evaluated so there will be no element will left unevaluated thats why your instance is not validating in this , you have to apply the keyword logically there is no meaning to put the unevaluatedItems property if all element is getting evaluated . 🙂 If i am wrong please correct me

But the order of keywords doesnt matter as per the docs, and: These instance items or properties may have been unsuccessfully evaluated against one or more adjacent keyword subschemas, such as when an assertion in a branch of an "anyOf" fails. Such failed evaluations are not considered to contribute to whether or not the item or property has been evaluated. Only successful evaluations are considered. -- it says only successful evaluations are consirdered to be evaluated

MeastroZI commented 8 months ago

@Era-cell when you make the unevaluateditems to false in your code and then run your instance you will not get the erroe related to unevaluated element , you will get error related to the Items keyword

That means items take care of all the element which is not consider by the prefix element and not let the flow go to the unevaluateditem keyword

Try it here https://json-schema.hyperjump.io/

MeastroZI commented 8 months ago

And Even if you remove the unevaluateditems keyword you will get the same error Guess why !

Same thing bcz items keyword take care of all the element which is not consider by the prefixitems

suprith-hub commented 8 months ago

And Even if you remove the unevaluateditems keyword you will get the same error Guess why !

Same thing bcz items keyword take care of all the element which is not consider by the prefixitems

Yeah, this was my initial thought.. But At this point presence of "items" keyword will not let any of the values to be unevaluated, as per your assumption

suprith-hub commented 8 months ago

@Era-cell unevaluatedItems is only apply to the element in array which is not evaluated but as you use the items before the unevaluatedItems this make all the array element succefully evaluated so there will be no element will left unevaluated thats why your instance is not validating in this , you have to apply the keyword logically there is no meaning to put the unevaluatedItems property if all element is getting evaluated . 🙂 If i am wrong please correct me

But the order of keywords doesnt matter as per the docs, and: These instance items or properties may have been unsuccessfully evaluated against one or more adjacent keyword subschemas, such as when an assertion in a branch of an "anyOf" fails. Such failed evaluations are not considered to contribute to whether or not the item or property has been evaluated. Only successful evaluations are considered. -- it says only successful evaluations are consirdered to be evaluated

Just is it possible to make this statement more clear..?😁

suprith-hub commented 8 months ago

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    },
}

now for ["aaa", "a", "bn", "an"] "an" should be left unevaluated because "a" took care of it, I expect the result to be true but given false, if even this is evaluated can I get an example where "items" is present and values are unevaluated

MeastroZI commented 8 months ago

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    },
}

just tell me one thing is it possible to make the string start with a and simultaneously start with b , so because there is no possible string which is start with a and also start with b that why you are getting error try this

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" }, 
            { "pattern": "b$" }  
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    }
}

on this instance ["aaa" ,"aab" ,"aaab" ]

will give the result true but if you add any string which not start with a and end with b then that element is get catch by the items keyword, as i said earlier items check for all the elements which not consider by the prefixitems , not let the element go toward unevaluatedItems !

Correct me please if i am wrong :smiley_cat:

suprith-hub commented 8 months ago

@jviotti , I have some more questions in alterschema: Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these Why did you opt to choose json-e over javascript functions.. because it was more intuitive? Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ? Are you going to use alterschema or that will be abandoned?

jviotti commented 8 months ago

@MeastroZI

@jviotti can you explain this

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [{ "type": "string" }, { "type": "string" }],
  "not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }
  },
  "unevaluatedItems": false
}

My understanding is that it dictates that there must not be any items in the array that are strings with a length less than 3. Therefore, the schema should only accept arrays where all elements have a minimum length of 3. However, it seems to also accept arrays like ["axd", "d"]. Could you clarify this?"

That schema looks overly complicated. Maybe what you want is this instead?

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "items": {
    "minLength": 3
  }
}

jviotti commented 8 months ago

@Era-cell

Also the unevaluatedItems behaviour is a bit wierd:

The unevaluatedItems behavior depend on other adjacent array-related keywords. As it name implies, unevaluatedItems will only kick-in for array items that have not been evaluated by adjacent array keywords, so the precent of items and prefixItems will indeed affect its behavior

jviotti commented 8 months ago

@Era-cell

I have some more questions in alterschema:Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these

These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.

Why did you opt to choose json-e over javascript functions.. because it was more intuitive?

The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.

Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?

Not sure I follow this. Can you give me an example?

Are you going to use alterschema or that will be abandoned?

I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine.

suprith-hub commented 8 months ago

@Era-cell

Also the unevaluatedItems behaviour is a bit wierd:

The unevaluatedItems behavior depend on other adjacent array-related keywords. As it name implies, unevaluatedItems will only kick-in for array items that have not been evaluated by adjacent array keywords, so the precent of items and prefixItems will indeed affect its behavior

@jviotti My query on this is: at the presence of items keyword wouldnt the items evaluate each and every instance value, so -- none of them will be left unevaluated. (can you give an example even at the presence of "items" keyword there are some unevaluated values left over)

jviotti commented 8 months ago

at the presence of items keyword wouldnt the items evaluate each and every instance value, so none of them will be left unevaluated.

Correct. Maybe this example helps clarifying that: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/unevaluatedItems.json#L64-L78

suprith-hub commented 8 months ago

@Era-cell

I have some more questions in alterschema:Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these

These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.

Why did you opt to choose json-e over javascript functions.. because it was more intuitive?

The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.

Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?

Not sure I follow this. Can you give me an example?

Are you going to use alterschema or that will be abandoned?

I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine.

like do we need to use parsers, lexifiers and new grammar defining the language, OR use abstraction over the json-e or javascript(or any other language to create functions with arguments) itself..?

jviotti commented 8 months ago

@Era-cell

like do we need to use parsers, lexifiers and new grammar defining the language, OR use abstraction over the json-e or javascript(or any other language to create functions with arguments) itself..?

It should be all JSON based. No need for a new grammar. Just use JSON's grammar. But don't embed an actual programming language like JavaScript on the JSON. JSON-e is one valid way of doing it. It expresses the transformations purely using JSON.

suprith-hub commented 8 months ago

Hi, @jviotti when the algorithm/DSL will be included in JSON Schema org, will the access to external json schema documents be provided,

"$ref":"other.json#/$defs/items/0"

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

jviotti commented 8 months ago

Hi @Era-cell

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

Great question! Yes on both cases:

A JSON Schema is allowed to externally reference another JSON Schema that makes use of a different draft. i.e. you can have a JSON Schema 2020-12 that externally references a JSON Schema Draft 4. So in that case, it is not really required to i.e. upgrade the other schema and we can simply ignore it if we don't have access to it
That said, while this cross-version referencing is supposed to work, I think many implementations out there don't properly support it, and the JSON Schema test suite doesn't cover it either. For these cases, what you can do is perform JSON Schema Bundling (https://json-schema.org/blog/posts/bundling-json-schema-compound-documents) before upgrading that schema. Bundling will bring in all externally referenced schema into a single schema with nested schema resources, and then we upgrade them all together

But in both cases, our upgrader shouldn't really mind. If its passed a schema with unresolved remote references, it will do what it can, and if its passed a bundled schema, it will transform the entire thing.

MeastroZI commented 8 months ago

"Hi, @jviotti! I have one more question about bundling schemas. Can I assume that the name(key) of the schema in $def will always be an $id of that schema, or it can be anything? For example, in this schema under the $def, the names are set to the $id of the schema:"

{
  "$id": "https://jsonschema.dev/schemas/examples/non-negative-integer-bundle",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Must be a non-negative integer",
  "$comment": "A JSON Schema Compound Document. Aka a bundled schema.",
  "$defs": {
    "https://jsonschema.dev/schemas/mixins/integer": {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://jsonschema.dev/schemas/mixins/integer",
      "description": "Must be an integer",
      "type": "integer"
    },
    "https://jsonschema.dev/schemas/mixins/non-negative": {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://jsonschema.dev/schemas/mixins/non-negative",
      "description": "Not allowed to be negative",
      "minimum": 0
    },
    "nonNegativeInteger": {
      "allOf": [
        {
          "$ref": "/schemas/mixins/integer"
        },
        {
          "$ref": "/schemas/mixins/non-negative"
        }
      ]
    }
  },
  "$ref": "#/$defs/nonNegativeInteger"
}

json-schema-org / community

GSoC: Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects #599