jkphl / micrometa

A meta parser for extracting micro information out of web documents, currently supporting Microformats 1+2, HTML Microdata, RDFa Lite 1.1, JSON-LD and Link Types, written in PHP
http://micrometa.jkphl.is
MIT License
115 stars 39 forks source link

JSONLD parsing failed; array was returned where itemtype was expected #43

Closed rvanlaak closed 4 years ago

rvanlaak commented 5 years ago

Parsing fails on JSONLD::parseNodeType:

Call to a member function getId() on array

Happens on parsing the following URL:

https://www.thelocal.es/20190227/why-is-veganism-on-the-rise-among-young-people-in-spain

rvanlaak commented 5 years ago

We tested a fix by trying to always return the first result in case an array was detected, but that seems to have side effects.

@jkphl any idea what would be going on here?

Sarke commented 5 years ago

I made this modification for now, since I don't really care about multiple types on this.

protected function parseNodeType(NodeInterface $node)
{
    /** @var Node $itemType */
    $itemType = $node->getType();
    if (is_array($itemType))
        $itemType = reset($itemType);

    return $itemType ? [$this->vocabularyCache->expandIRI($itemType->getId())] : [];
}
rvanlaak commented 5 years ago

We tried something similar, but had the idea it had a regression. Could you create a PR to see if all tests would pass?

When using that snippet, we get a server error (exception can not get caught) when parsing the following URL: https://www.adamenfroy.com/how-to-make-money-on-youtube

As you can see, apparently the Yoast SEO plugin allows putting two types on the graph objects.

<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'>{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": [
        "Person",
        "Organization"
      ],
      "@id": "https://www.adamenfroy.com/#/schema/person/a11af0a57e4b9a6236175e856a9df265",
      "name": "Adam Enfroy",
      "image": {
        "@type": "ImageObject",
        "@id": "https://www.adamenfroy.com/#personlogo",
        "url": "https://www.adamenfroy.com/wp-content/uploads/Adam_Enfroy_Sidebar.jpg",
        "width": 360,
        "height": 360,
        "caption": "Adam Enfroy"
      },
      "logo": {
        "@id": "https://www.adamenfroy.com/#personlogo"
      },
      "description": "Full-time blogger and affiliate marketing expert. Join me and 150,000 monthly readers here, on <b><a href=\"http://www.adamenfroy.com\">AdamEnfroy.com</a></b> to learn how to <a href=\"http://www.adamenfroy.com/how-to-start-a-blog\">scale your blog like a startup</a> and <a href=\"http://www.adamenfroy.com/how-to-make-money-online\">make money online</a> faster.\r\n\r\n<b><a href=\"http://www.adamenfroy.com/resources\">Get Started with My Recommended Resources</a>.</b>",
      "sameAs": [
        "https://www.facebook.com/adamenfroydotcom",
        "https://www.instagram.com/adamenfroy",
        "https://www.linkedin.com/in/adamenfroy",
        "https://twitter.com/adamenfroy",
        "https://www.youtube.com/adamenfroy"
      ]
    },
    {
      "@type": "WebSite",
      "@id": "https://www.adamenfroy.com/#website",
      "url": "https://www.adamenfroy.com/",
      "name": "Adam Enfroy",
      "publisher": {
        "@id": "https://www.adamenfroy.com/#/schema/person/a11af0a57e4b9a6236175e856a9df265"
      },
      "potentialAction": {
        "@type": "SearchAction",
        "target": "https://www.adamenfroy.com/?s={search_term_string}",
        "query-input": "required name=search_term_string"
      }
    },
    {
      "@type": "ImageObject",
      "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#primaryimage",
      "url": "https://www.adamenfroy.com/wp-content/uploads/How-to-Make-Money-on-YouTube.jpg",
      "width": 800,
      "height": 500,
      "caption": "How to Make Money on YouTube"
    },
    {
      "@type": "WebPage",
      "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#webpage",
      "url": "https://www.adamenfroy.com/how-to-make-money-on-youtube",
      "inLanguage": "en-US",
      "name": "7 Best Ways How to Make Money on Youtube in 2020",
      "isPartOf": {
        "@id": "https://www.adamenfroy.com/#website"
      },
      "primaryImageOfPage": {
        "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#primaryimage"
      },
      "datePublished": "2019-11-17T00:41:20+00:00",
      "dateModified": "2019-11-16T18:41:20-06:00",
      "description": "Here are the top 7 ways to make money on YouTube with ads, affiliate marketing, sponsored products, sending users to your blog and more.",
      "breadcrumb": {
        "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#breadcrumb"
      }
    },
    {
      "@type": "BreadcrumbList",
      "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#breadcrumb",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "item": {
            "@type": "WebPage",
            "@id": "https://www.adamenfroy.com/",
            "url": "https://www.adamenfroy.com/",
            "name": "Home"
          }
        },
        {
          "@type": "ListItem",
          "position": 2,
          "item": {
            "@type": "WebPage",
            "@id": "https://www.adamenfroy.com/blog",
            "url": "https://www.adamenfroy.com/blog",
            "name": "Blog"
          }
        },
        {
          "@type": "ListItem",
          "position": 3,
          "item": {
            "@type": "WebPage",
            "@id": "https://www.adamenfroy.com/make-money-online",
            "url": "https://www.adamenfroy.com/make-money-online",
            "name": "Make Money Online"
          }
        },
        {
          "@type": "ListItem",
          "position": 4,
          "item": {
            "@type": "WebPage",
            "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube",
            "url": "https://www.adamenfroy.com/how-to-make-money-on-youtube",
            "name": "7 Best Ways How to Make Money on Youtube in 2020"
          }
        }
      ]
    },
    {
      "@type": "Article",
      "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#article",
      "isPartOf": {
        "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#webpage"
      },
      "author": {
        "@id": "https://www.adamenfroy.com/#/schema/person/a11af0a57e4b9a6236175e856a9df265"
      },
      "headline": "7 Best Ways How to Make Money on Youtube in 2020",
      "datePublished": "2019-11-17T00:41:20+00:00",
      "dateModified": "2019-11-16T18:41:20-06:00",
      "commentCount": "17",
      "mainEntityOfPage": {
        "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#webpage"
      },
      "publisher": {
        "@id": "https://www.adamenfroy.com/#/schema/person/a11af0a57e4b9a6236175e856a9df265"
      },
      "image": {
        "@id": "https://www.adamenfroy.com/how-to-make-money-on-youtube#primaryimage"
      },
      "keywords": "video marketing,YouTube",
      "articleSection": "Make Money Online"
    }
  ]
}</script>
rvanlaak commented 5 years ago

@Sarke thank you for your snippet!

It made me think about what could go wrong, and after some more debugging we found out that the parseNodeType does not properly handle all itemType->getType return options, which also can be null, or can return an array with NodeInterfaces.

We had over 5,000 events in Sentry already. PR #46 fixes that.

Sarke commented 5 years ago

Thanks @rvanlaak!

As for the Yoast SEO issue, I used the snippet in combination with the JsonLDFilteredParser you posted in the other issue: https://github.com/jkphl/micrometa/issues/16#issuecomment-550226719

rvanlaak commented 5 years ago

For clarification; it's not a problem caused by the Yoast plugin, as it is perfectly fine to have more than one @type

"@type": [
    "Person",
    "Organization"
 ],