structured-data / linter

Structured Data linter
The Unlicense
88 stars 18 forks source link

No structured data detected #45

Closed jaygray0919 closed 6 years ago

jaygray0919 commented 6 years ago

we have a file that is harvested correctly using the Google Structured Data Testing Tool. the source file is here: https://afdsi.org/rdf_50/en-US/ here is the GSDTT link: https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fafdsi.org%2Frdf_50%2Fen-US%2F

should we re-organize the structured data to be processed by SDL? we'd like to embed a link on our pages that passes the page to SDL so folks can better visualize the structure - as is possible on SDL.

/jay gray

gkellogg commented 6 years ago

This is an odd case, it ends up taking a while to parse, and the back-end is likely timing out. FWIW, I get the following lint messages when run locally:

INFO Linted in 2.061921 seconds.
Messages:
statement  og:description
  Triple </tmp/jay_gray.html> <http://ogp.me/ns#description> "Okay, so maybe this is a bit more than fifty words, but the key points are pretty simple (and put into bold text for you manager–types who just want to get straight to the point). The Resource Description Framework, or “RDF”, is really two things."@en-us .
 is invalid
statement  schema:url
  Triple <https://ontomatica.io/identifier/1217030000101210> <http://schema.org/url> <00https://ontomatica.io/media/1217030000101210TechnicalArticle_w738_h300.jpg> .
 is invalid
property  schema:dateModified
  Object ""^^<http://schema.org/Date> not compatible with rangeIncludes (schema:Date,schema:DateTime)
property  schema:isAccessibleForFree
  Object "yes" not compatible with rangeIncludes (schema:Boolean)

I suspect Google won't properly integrate statements made in numerous different script elements, so you might consider merging them into a single script element.

An optimization in the RDFa reader would be to cache readers found from script elements, which could reduce processing time needed to instantiate a new reader, and may affect loading contexts which could be cached.

jaygray0919 commented 6 years ago

actually google (GSDTT) perfectly integrates normalized scripts. we spend a lot of time normalizing these scripts for max reuse and then testing on GSDTT.

WRT: ogp.me - that's from <head>; not concerned about that error. WRT: <00https://ontomatica.io/media/1217030000101210 (not sure how '<00https' got injected; will fix). WRT: 'Object ""^^http://schema.org/Date'; we see this regularly on SDL and cannot figure how to compose the information to pacify SDL. Since GSDTT accepts it, we ignore that SDL message. WRT: 'Object "yes"' - will change to yes (no double quotes).

SDL is a fantastic tool, and communicates the logical design of the linked data better than GSDTT. Should we install and run it on our own server (perhaps allocating more resources than you currently allocate) to visualize these large Linked Data sets?

/jay

gkellogg commented 6 years ago

""^^http://schema.org/Date

This is because the empty string is not a valid date. The (actually, rdf-reasoner) uses an ISO 8601 regular expression against values asserted to be schema:Date

    ISO_8601 =  %r(^
      # Year
      ([\+-]?\d{4}(?!\d{2}\b))
      # Month
      ((-?)((0[1-9]|1[0-2])
            (\3([12]\d|0[1-9]|3[01]))?
          | W([0-4]\d|5[0-2])(-?[1-7])?
          | (00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))
          ([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)
                 ([\.,]\d+(?!:))?)?
                (\17[0-5]\d([\.,]\d+)?)?
                ([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?
          )?
      )?
    $)x.freeze

It also accepts appropriate XSD datatypes, but would reject anything which is empty.

The reason it was timing out is that the code wasn't caching the schema.org context between successive calls; that's since been fixed on the linter, and it should process your code faster now.

Feel free to install your own instance, but if you want to include it in your Continuous Integration, you might use the rdf lint command-line executable, available when you install the linkeddata Ruby gem. It doesn't give you the nice output formatting, but does the same integrity checking.

jaygray0919 commented 6 years ago

looks very good. ok re @Date - no nulls! we actually like redirecting to the SDL because it removes any suspicion that a Wizard of Oz is doing something behind the scenes on our server. SDL and GSDTT are effective complements. thanks for your help here Gregg. /jay