w3c / pub-manifest

W3C Publication Manifest
https://w3c.github.io/pub-manifest
Other
7 stars 16 forks source link

schema: validate @context for language and dir #222

Closed marisademeglio closed 4 years ago

marisademeglio commented 4 years ago

Right now, no error is given for an invalid language or direction in the context property. E.g. this does not raise any errors:

"@context": [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        {
         "language": "mmm@@mmm",
         "direction": "xyz"
        }
    ],
mattgarrish commented 4 years ago

I don't believe the schemas are capable of validating this. The object with the language and direction doesn't have to be the third item in the array, so I don't think you can add a pattern that says there must be object(s) somewhere else in the array with one or the other of the properties, as you could conceivably run into things like this:

"@context": [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        "https://example.com/custom-context",
        {
         "language": "mmm@@mmm"
        },
        {
         "direction": "xyz"
        }
    ],
mattgarrish commented 4 years ago

Or maybe I have to correct myself. It looks like allOf can be added to the context to achieve this:

"allOf": [
    {
        "contains": {
            "type": "object",
            "properties": {
                "language": {
                    "$ref": "bcp.schema.json"
                }
            },
            "required": ["language"]
        }
    },
    {
        "contains": {
            "type": "object",
            "properties": {
                "direction": {
                    "type": "string",
                    "enum": ["ltr", "rtl"]
                }
            },
            "required": ["direction"]
        }
    }
],

I'll try to test it out more rigorously tomorrow, but seemed to be working with what I posted above.

iherman commented 4 years ago

I am no schema expert, very far of it, but my impression reading the spec is that your example does validate, but, e.g.,

"@context": [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        "https://example.com/custom-context",
        {
         "direction": "xyz"
        }
    ],

would not. allOf means that the instance validate against all elements in its array, and contains means that the instance must have at least one appearance of the schema therein...

mattgarrish commented 4 years ago

Ah, bugger. Forgot only the two context strings are required.

Right, and if we make it anyOf then if the object doesn't validate it won't register as an error because it's simply seen as not conforming to those possible definitions.

So, ya, probably back to this not being possible to express.

mattgarrish commented 4 years ago

Although, see if you can pop a hole in this one:

"@context": {
    "type": "array",
    "items": [
        {
            "const": "https://schema.org"
        },
        {
            "const": "https://www.w3.org/ns/pub-context"
        }
    ],
    "additionalItems": {
        "anyOf": [
            {
                "type": "string"
            },
            {
                "type": "array"
            },
            {
                "type": "object",
                "properties": {
                    "language": {
                        "$ref": "bcp.schema.json"
                    }
                },
                "required": ["language"]
            },
            {
                "type": "object",
                "properties": {
                    "direction": {
                        "type": "string",
                        "enum": ["ltr", "rtl"]
                    }
                },
                "required": ["direction"]
            },
            {
                "type": "object",
                "properties": {
                    "language": false,
                    "direction": false
                }
            }
        ]
    },
    "uniqueItems": true
},
marisademeglio commented 4 years ago

I tried it against some samples and I couldn't break it :)

mattgarrish commented 4 years ago

Cool! I think I'm getting my head around how to create more complex assertions in this language now.

I'm going to see if I can atomize the publication.schema.json file a little more tomorrow, though, so we don't have to copy these patterns around each time they change. Ideally, each property will ref its constraints unless they're unique to the format, like the conformsTo string.

iherman commented 4 years ago

Caveats: my knowledge of json schema is not really good...

"@context": {
  "type": "array",
  "items": [
      {
          "const": "https://schema.org"
      },
      {
          "const": "https://www.w3.org/ns/pub-context"
      }
  ],
  "additionalItems": {
      "anyOf": [
          {
              "type": "string"
          },
          {
              "type": "array"
          },

Why do we need this one? The context may include a reference to other context files (ie, strings), plus language and direction settings. What does this entail?

          {
              "type": "object",
              "properties": {
                  "language": {
                      "$ref": "bcp.schema.json"
                  }
              },
              "required": ["language"]
          },
          {
              "type": "object",
              "properties": {
                  "direction": {
                      "type": "string",
                      "enum": ["ltr", "rtl"]
                  }
              },
              "required": ["direction"]
          },
          {
              "type": "object",
              "properties": {
                  "language": false,
                  "direction": false
              }
          }

What is the role of this? Do we allow additional JSON-LD things? Reading the text in the spec we are only talking about language and directions as plus items...

(I am not against leaving this open, but I am not sure we are in line with the spec.)

      ]
  },
  "uniqueItems": true
},
mattgarrish commented 4 years ago

Why do we need this one? The context may include a reference to other context files (ie, strings), plus language and direction settings. What does this entail?

I wasn't sure about arrays, either, but the specification doesn't restrict what can go in the context and arrays can contain arrays. If we remove, are we causing any potential real-world issues?

"properties": {
  "language": false,
  "direction": false
}

What is the role of this?

The false declarations forbid those properties from being used in any other object in a non-conforming way. It's a json schema shorthand for combining "not" with "required" to disallow them.

So you can have an object with language that conforms to our definition, an object (potentially the same one) with direction that conforms to our definition, or else you get an error if they have a non-conforming value.

mattgarrish commented 4 years ago

but the specification doesn't restrict what can go in the context and arrays can contain arrays

Does JSON-LD 1.1 disallow arrays? I found this:

As the context above shows, the value of a term definition can either be a simple string, mapping the term to an IRI, or a JSON object.

But it's not normative.

iherman commented 4 years ago

but the specification doesn't restrict what can go in the context and arrays can contain arrays

Does JSON-LD 1.1 disallow arrays? I found this:

As the context above shows, the value of a term definition can either be a simple string, mapping the term to an IRI, or a JSON object.

But it's not normative.

What counts is https://www.w3.org/TR/json-ld11/#context-definitions (that is normative) and it defines a context as a map. Ie, you cannot just put there an array in the array...

mattgarrish commented 4 years ago

What counts is https://www.w3.org/TR/json-ld11/#context-definitions (that is normative) and it defines a context as a map. Ie, you cannot just put there an array in the array...

Ah, okay, I'll drop it then... :)

iherman commented 4 years ago

Why do we need this one? The context may include a reference to other context files (ie, strings), plus language and direction settings. What does this entail?

I wasn't sure about arrays, either, but the specification doesn't restrict what can go in the context and arrays can contain arrays. If we remove, are we causing any potential real-world issues?

"properties": {
    "language": false,
    "direction": false
}

What is the role of this?

The false declarations forbid those properties from being used in any other object in a non-conforming way. It's a json schema shorthand for combining "not" with "required" to disallow them.

So you can have an object with language that conforms to our definition, an object (potentially the same one) with direction that conforms to our definition, or else you get an error if they have a non-conforming value.

so what you mean is that the following is disallowed:

"@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
       {
             "@language" : "fr"
       },
      {
            "term" : "https://example.org/term",
           "language" : "fr"
      }
]

This is certainly valid JSON-LD. But I did not see in https://www.w3.org/TR/pub-manifest/#manifest-context where we say that there can only be one language setting...

mattgarrish commented 4 years ago

where we say that there can only be one language setting...

You can have multiple language declarations so long as they all conform to our definition. anyOf applies to each additional item in the array. So long as the object's language/direction meets our conformance requirement, it will pass. If not, you're not allowed to have them in an object.

For example, in the following, only the last instance will result in an error since you can't have an object with an invalid language code:

"@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
       {
             "language" : "fr"
       },
      {
            "term" : "https://example.org/term",
           "language" : "fr"
      },
      {
           "language": 1
      }
]
iherman commented 4 years ago

Ah! you said:

The false declarations forbid those properties from being used in any other object in a non-conforming way. It's a json schema shorthand for combining "not" with "required" to disallow them.

I did not understand the schema. We are o.k. then.

mattgarrish commented 4 years ago

I just pushed one more update, as the schema allowed for one property to be invalid if the other one was valid in a single object, since only one rule has to match. So this would be valid even though the language isn't:

{
    "direction": "rtl",
    "language": 1
}

To fix this, I've tweaked the rule so that either there is a valid language and no direction, a valid direction and no language, both a valid language and direction, or the object can't include either.

That should cover everything now...