iaincollins / structured-data-testing-tool

A library and command line tool to help inspect and test for Structured Data.
https://www.npmjs.com/package/structured-data-testing-tool
ISC License
63 stars 14 forks source link

Handle when an itemProp contains multiple properties #4

Closed iaincollins closed 4 years ago

iaincollins commented 4 years ago

Neither the API or CLI find properties in an itemProp when it contains multiple properties.

In these example from a NYT article the datePublished and publisher properties are not found as they are combined with other properties:

<meta data-rh="true" property="article:published" itemprop="datePublished dateCreated" content="2019-07-21T09:00:06.000Z"/>
<span itemProp="publisher copyrightHolder provider sourceOrganization" itemscope="" itemType="http://schema.org/NewsMediaOrganization" itemID="https://www.nytimes.com">

See also this example of an image property from this Guardian article which is also not detected:

<figure itemprop="associatedMedia image" itemscope itemtype="http://schema.org/ImageObject" data-component="image" class="element element-image img--landscape  fig--narrow-caption fig--has-shares " data-media-id="f82028d62b1edd7417d7d3773c4abf0d4fa86174" id="img-3">
  <meta itemprop="url" content="https://i.guim.co.uk/img/media/f82028d62b1edd7417d7d3773c4abf0d4fa86174/0_272_6435_3861/master/6435.jpg?width=700&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=016df6a3f33eabe3cbca39eb389a60fb">
</figure>

This is an edge case usage scenario that passes in the Google Structured Data Testing Tool but does not pass in this tool. This bug is only known to happen when parsing HTML/microdata but could potentially be triggered by RDFa or JSON-LD markup.