rust-lang / mdBook

Create book from markdown files. Like Gitbook but implemented in Rust
https://rust-lang.github.io/mdBook/
Mozilla Public License 2.0
17.84k stars 1.62k forks source link

Fix text from menu bar in Google Search #2397

Open vklachkov opened 4 months ago

vklachkov commented 4 months ago

I noticed an issue that Google for some mdBooks in the search results shows a list of themes and the title of the book. It's a small thing, but an eyesore.

image

My guess is that Google thought #menu-bar is the part of the page content and indexed it. To prevent this, I replaced the tag of menu with <header>.

ehuss commented 4 months ago

Thanks! Seems like it would be great to fix this. Do you happen to know if there is a way to test how Google generates the snippet? Or do you have any links to information about whether or not this will make a difference? All I could find is https://developers.google.com/search/docs/appearance/snippet?hl=en, which only really mentions adding a description.

vklachkov commented 4 months ago

@ehuss I also tried to find information, but didn't find anything. Experienced web-developers suggested that semantic layout is very important and Google somehow takes this into account when indexing.

jsha commented 4 months ago

Thanks for pointing this out @vklachkov! I could be wrong, but I don't think a <header> tag is necessarily excluded from snippets. There are two paths that I think will work:

  1. Put the data-nosnippet attribute on the element that contains the navbar: https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#data-nosnippet-attr
  2. Wrap the navbar in a <nav> element, which is what rustdoc does. I can't find documentation that Google will definitely skip it for snippets, but semantically that makes sense. Also it may help with screen readers: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/nav#usage_notes
vklachkov commented 3 months ago

Sorry for the long wait, @jsha.

I think that although both nav and data-nosnippet will work, it is not very correct semantically.

To quote MDN:

The \<header> element can define a global site header \<...>. It usually includes a logo, company name, search feature \<...>. It is generally located at the top of the page.

I think header tag would be more semantically correct. But if you insist, I can replace it with nav, as in rust doc.

jsha commented 3 months ago

That's interesting @vklachkov! Also from MDN, in the historical note section:

The <header> element originally existed at the very beginning of HTML for headings. It is seen in the very first website. At some point, headings became <h1> through <h6>, allowing <header> to be free to fill a different role.

And indeed I was thinking of <header> in that old-school role.

I'm not a maintainer on mdBook, so I don't insist either way, and I don't have an opinion as to whether <nav> or (the modern semantics of) <header> is more semantically correct. There's no documentation of specific tags being skipped for Google snippets, other than the data-nosnippet property.

I suppose my recommendation to the mdBook team would be to go ahead and accept this as harmless and easy, and if it doesn't do the job @vklachkov could submit a followup PR with data-nosnippet or <nav>. :D