jekyll / jekyll-seo-tag

A Jekyll plugin to add metadata tags for search engines and social networks to better index and display your site's content.
https://jekyll.github.io/jekyll-seo-tag
MIT License
1.66k stars 294 forks source link

This way to a passing Rich Results logo test #429

Open benjithaimmortal opened 3 years ago

benjithaimmortal commented 3 years ago

A forward

@ashmaroli Thank you for your patience as I got enough Ruby under my belt to even get this far. I started using pry to debug the code in-stream, and it allowed me to even test the output as it was rendered in rspec. This is now passing the test on Google, and passing rspec and rubocop. I don't think my code is great, but it was an attempt. My first! Let's change any/all of it in the hopes that the overall contribution helps this cool open source project. --Benji


TL;DR

  1. We need a "url" field inside of "@type" => "Organization"
  2. We need "@type" => "Organization" to always be separate from "@type" => "WebPage"
  3. We need to get it out of "publisher"

We're trying to get a set logo to pass a Rich Results test, and there are a few problems.

Google may have changed this, but at the moment it isn't reading things inside of the "publisher" key. As I mentioned in my open issue we need to find a way to modify the hash to remove it from "publisher" (which isn't a type property on schema.org anyway).

My proposal: let's take a look at Yoast, because they do this a lot

Yoast likes to make tons of different @type attributes, and house them as an array of objects inside of @graph. That's what I'm finding on schema.org, too. So now we want things to look like this:

{
  "@context": "https://schema.org",
  "@graph": [
    { "@type": "Organization", },
    { "@type": "WebSite", },
    { "@type": "ImageObject", },
    { "@type": "WebPage", },
  ]
}

I'd rather manipulate the hash than metaprogram

def to_json
  # this was what happened before
  graph = to_h.reject { |_k, v| v.nil? }

  # assign publisher and remove it from the array
  # (because I don't know the meta-programming involved in instantiating it)
  publisher = graph["publisher"] || nil
  graph.delete("publisher")

  updated_graph = {
    "@context" => "https://schema.org",
    "@graph"   => [graph],
  }
  if publisher
    # .push({"something" => blah}) === << {"something" => blah}
    updated_graph["@graph"] << publisher
  end

  # .to_json for the win
  updated_graph.to_json
end

Results of the change

Original Output:

{
  "@type": "WebPage",
  "publisher": {
    "@type": "Organization",
    "logo": { "@type": "ImageObject", "url": "http://example.invalid/logo.png" }
  },
  "url": "http://example.invalid/page.html",
  "@context": "https://schema.org"
}

New Output (confirmed on Rich Results Test, when you change the canonical url to a valid one):

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "url": "http://example.invalid/page.html",
      "@context": "https://schema.org"
    },
    {
      "@type": "Organization",
      "url": "http://example.invalid/page.html",
      "logo": {
        "@type": "ImageObject",
        "url": "http://example.invalid/logo.png"
      }
    }
  ]
}
DarkWiiPlayer commented 3 years ago

If I read this correctly, this seems to be solving a different problem than #428, although there seems to be some overlap and both PRs modify the to_json method (though this should be trivial to merge).

@benjithaimmortal could you have a quick look at #428 and confirm whether these are two distinct problems?


Also, I've tried to do some quick googling on the @graph thing to see if it would make sense to update my own pull request to also allow modifying that array, but I couldn't find anything about it. Is that a JSON-LD feature, or something specific to Googles Structured Data?

benjithaimmortal commented 3 years ago

@DarkWiiPlayer you're right! We're approaching the same point in the code but doing very different things. I think my results should precede yours [in the code logic, that is], so you can modify things after the correct formatting to the JSON-LD has been applied.

@graph is the way mainstream tools like Yoast combine multiple Structured Data results into a single page (for example, making two @types for the same page). I haven't seen another method that works quite like it. See this blog heading 'Creating a node array using @graph' for a bit more on how it's working.

DarkWiiPlayer commented 3 years ago

I think my results should precede yours [in the code logic, that is], so you can modify things after the correct formatting to the JSON-LD has been applied.

@benjithaimmortal Yes, absolutely. However, I'm wondering how that would work with the way my PR currently works where you build a new tree of json data that gets merged into the autogenerated one because now that includes an array, so one has to throw in a numeric index.

For example, if you have a @type: Webpage and a @type: Organization and want to change the url for the Webpage, you'd have to index @graph with [1], which seems quite arbitrary and brittle.

I could add a bunch more logic to filter subtrees by their @type so you can have one override for Webpage types and a different one for Organization types, but that'd be a lot of complexity for such a simple feature.

benjithaimmortal commented 3 years ago

@DarkWiiPlayer yeppers. I encountered the same when I was building mine. IMO the need is a simple hash from the yaml settings, and it's overcomplicated by the Ruby magic inside of a new Jekyll instance. Perhaps it could be extracted entirely? I'd love to see something that simply builds the JSON out of config.yml and plops it in the document header.

benjithaimmortal commented 3 years ago

If you're worried about the array itself, perhaps we could make a hash with @type as a key and just strip it from the final JSON? These feel like decisions that are above my head though.

benjithaimmortal commented 3 years ago

Excellent! I'll get to it and modify this ASAP. Thank you all for your comments and help.