google / schemarama

Schemarama is a project exploring standards-based validation for structured data, especially Schema.org.
Apache License 2.0
124 stars 22 forks source link

Figure out a pattern for cutting down noisy validation errors #39

Open danbri opened 2 years ago

danbri commented 2 years ago

I forget the terminology, but the AND/NOT/OR complexity here serves to suppress errors that are downstream of some more fundamental error. @ericprud et al have plans for doing this within ShEx in more standardized ways, so the actual intended shape content doesn't get lost in all the boolean trickery.

PREFIX : <http://schema.org/>
PREFIX validate: <https://google.com/search/validation/valid>

<S1> {
  :url . + %validate:url{console.log('some url checking code here')%}
} AND {
  :datePublished . ? %validate:date-time{console.log('some datetime checking code here')%}
} AND {
  :claimReviewed . 
} AND {
  :itemReviewed {
    a [:CreativeWork]
  } AND (
    NOT {
      a [:CreativeWork]
    } OR {
      :author (
        {
          a [:Organization]
        } OR {
          a [:Person]
        }
      ) AND (
        NOT (
          {
            a [:Organization]
          } OR {
            a [:Person]
          }
        ) OR {
          :name . ?
        }
      )?
    } AND {
      :datePublished . ? %validate:date-time{console.log('some datetime checking code here')%}
    }
  )?
} AND {
  :author (
    {
      a [:Organization]
    } OR {
      a [:Person]
    }
  ) AND (
    NOT (
      {
        a [:Organization]
      } OR {
        a [:Person]
      }
    ) OR (
      {
        :name . 
      } OR {
        :url . 
      }
    ) AND {
      :url . * %validate:url{console.log('some url checking code here')%}
    }
  )?
} AND {
  :reviewRating {
    a [:Rating]
  } AND (
    NOT {
      a [:Rating]
    } OR {
      :alternateName . 
    } AND (
      (
        NOT {
          :name . 
        } OR {
          :alternateName . ?
        }
      ) AND (
        NOT (
          NOT {
            :name . 
          }
        ) OR {
          :alternateName . +
        }
      )
    ) AND NOT (
      {
        :alternateName . 
      } AND {
        :name . 
      }
    ) AND (
      NOT (
        (
          {
            :ratingValue . 
          } OR {
            :bestRating . 
          } OR {
            :worstRating . 
          }
        ) AND NOT (
          {
            :ratingValue /-1/ 
          } AND {
            :bestRating /-1/ 
          } AND {
            :worstRating /-1/ 
          }
        )
      ) OR {
        :ratingValue /([0-9]+[\.,]?[0-9]*)\/([0-9]+[\.,]?[0-9]*)/  OR /([0-9]+[\.,]?[0-9]*)%/  OR /([0-9]+[\.,]?[0-9]*)/ +
      } AND (
        NOT {
          :ratingValue /([0-9]+[\.,]?[0-9]*)/ +
        } OR {

        } %validate:rating%
      )
    )
  )+
}
danbri commented 2 years ago

See https://github.com/shexSpec/shex/blob/master/status.md under "discriminators@.

As described by @ericprud back in 2020:

the idea is that if you're validating something as a CreativeWork and it has a type of Recipe but doesn't actually satisfy Recipe, ShEx won't drown you in errors about every kind of CreativeWork that it fails, but instead it will just tell you why it doesn't satisfy Recipe (and yes, danbri, i know it's not "failing", but you get the idea) if y'all think the above description is missing some use case or nuance, let us know

Draft spec and examples: https://hackmd.io/1fpnYHxoSYOQhvYxHXddjA

danbri commented 1 year ago

ShEx service-specific shape examples for Recipe and Dataset, https://github.com/google/schemarama/tree/main/demo/validation/shex/specific/ServiceB

danbri commented 1 year ago

We want the bit of ShEx that checks for 'name' property in the shape to be able to link to something like https://developers.google.com/search/docs/advanced/structured-data/recipe#the_bit_of_the_docs_that_talks_about_name_property.

Realistically this might need initially to be done via out-of-band info rather than assuming all the shex from Google carries such details. Hence shapepath / IDs being an issue.