iaincollins / structured-data-testing-tool

A library and command line tool to help inspect and test for Structured Data.
https://www.npmjs.com/package/structured-data-testing-tool
ISC License
63 stars 14 forks source link

Not detecting schemas #43

Open TheUltimateCookie opened 2 years ago

TheUltimateCookie commented 2 years ago

I tried both CLI and programmatically to check and validate schemas in multiple WordPress sites but tests were failing.

Sites

https://growth.cx has 4 schemas when checking manually (WebSite,ImageObject,WebPage,BreadcrumbList)

sdtt --url https://growth.cx --schemas "WebSite,ImageObject,WebPage,BreadcrumbList" image

https://crawlq.ai has 5 schemas when checking manually (Organization,WebSite,ImageObject,WebPage,BreadcrumbList)

sdtt --url https://crawlq.ai --schemas "Organization,WebSite,ImageObject,WebPage,BreadcrumbList" image

The schemas were generated by the Yoast SEO plugin.

Any advice?

TheUltimateCookie commented 2 years ago

Update

I found out that this issue is because of the difference in schema structure generated by Yoast.

Structure A

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebSite",
      "@id": "http://growth.cx/#website",
      "url": "http://growth.cx/",
      "name": "Growth.CX Blog!",
      "description": "Just another WordPress site",
      "potentialAction": [
        {
          "@type": "SearchAction",
          "target": {
            "@type": "EntryPoint",
            "urlTemplate": "http://growth.cx/?s={search_term_string}",
          },
          "query-input": "required name=search_term_string",
        },
      ],
      "inLanguage": "en-US",
    },
    {
      ...
    },
    {
      ...
    },
  ],
}

Structure B

[
  {
    "@context": "https://schema.org",
    "@type": "WebSite",
    "@id": "https://suitejar.com#website",
    "headline": "Suitejar",
    "name": "Suitejar",
    "description": "A simple yet powerful tool for Content Marketers & SEO Analysts to improve the content marketing efforts to bring in exponential organic growth.",
    "url": "https://suitejar.com",
    "potentialAction": {
      "@type": "SearchAction",
      "target": "https://suitejar.com/?s={search_term_string}",
      "query-input": "required name=search_term_string"
    }
  },
  {
    "@context": "https://schema.org",
    "@type": "Organization",
    "@id": "https://suitejar.com#Organization",
    "name": "Suitejar",
    "url": "https://suitejar.com",
    "sameAs": []
  }
]

For Structure A, the program will say no schemas found. For Structure B, the program will find all the schemas.

There is a workaround for this programmatically using cheerio

const response = await fetch(url);
const html = await response.text();
const $ = cheerio.load(html);
const schemaScript = $("script[type='application/ld+json']").text();
const jsonld = await JSON.parse(schemaScript);
const structuredDatas = [];

// getting the schemas
if (jsonld["@graph"]) { // Structure A
  for (let i = 0; i < jsonld["@graph"]?.length; i++) {
    structuredDatas.push(jsonld["@graph"][i]);
  }
} else { // Structure B
  for (let i = 0; i < jsonld.length; i++) {
    structuredDatas.push(jsonld[i]);
  }
}

const schemas = [];
for (let i = 0; i < structuredDatas.length; i++) {
  schemas.push(structuredDatas[i]["@type"]);
}

// validation
const { passed, failed, warnings } = await structuredDataTestString(
  JSON.stringify(structuredDatas, {
    schemas,
  })
);

This way, I can work around the issue programmatically but this won't work via CLI.