velsa / notehost

Free Hosting for Notion Sites!
MIT License
107 stars 10 forks source link

MetaRewriter is no longer correctly replacing title and description #36

Open aaccioly opened 4 months ago

aaccioly commented 4 months ago

Meta tags are currently receiving default values from Notion, e.g.:

<meta name="twitter:site" content="@NotionHQ">
<meta name="twitter:title" content="Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.">
<meta name="twitter:description"
  content="A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team">
<!--- ...  -->
<meta property="og:title" content="Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.">
<meta property="og:description"
  content="A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team">

Which leads to SEO issues:

image

The problem is likely happening at: https://github.com/velsa/notehost/blob/d227b8c2fbef34644cc337af9f327a40f41da289/src/rewriters/meta-rewriter.ts#L23-L24

Which gets used at:

https://github.com/velsa/notehost/blob/d227b8c2fbef34644cc337af9f327a40f41da289/src/rewriters/meta-rewriter.ts#L29-L35 And:

https://github.com/velsa/notehost/blob/d227b8c2fbef34644cc337af9f327a40f41da289/src/rewriters/meta-rewriter.ts#L41-L50

A quick and dirty workaround is to use siteName and siteDescription instead of content for all pages. A more flexible solution would be to parameterise slugs with optional custom titles + descriptions, falling back to siteNameand description:

  siteName: 'My Notion Website',
  siteDescription: 'Build your own website with Notion. This is a demo site.',
  // Should replace twitter:site
  siteTwitter: '@MyHandle'

  slugToPage: {
    '': 'NOTION_PAGE_ID',
    about: 'NOTION_PAGE_ID',
    // Both pages above will fallback to siteName + description
    contact: {
      id: 'NOTION_PAGE_ID',
      title: 'Contact form',
      description: 'Bla bla bla',
    },
  },

Bonus, we should probably prepend https:// to the domain at:

https://github.com/velsa/notehost/blob/d227b8c2fbef34644cc337af9f327a40f41da289/src/rewriters/meta-rewriter.ts#L52-L54

aaccioly commented 4 months ago

Since the 2024-06-25 update (https://www.notion.so/releases/2024-06-25), when a page is configured to be "Discoverable on the web", Notion now sets:

Notion SEO Menu

  1. Titles ending in "| Notion"
  2. Clean descriptions taken from the first text block on the page
  3. A "good" default image (e.g., page cover)
  4. og:site_name ending in "on Notion"
  5. No longer sets twitter:site to @NotionHQ

So, most of the problems mentioned above are now fixed on Notion's side. Nevertheless, customising meta tags is a premium feature, so I think we can still make good use of this PR. I'll update it to fallback to the old behaviour of extracting content from Notion when pageMetadata is not specified.

velsa commented 4 months ago

Thanks for the pull request!

Merged it with a small refactoring.

Now version 1.0.31 is the latest one containing your PR.

aaccioly commented 4 months ago

Hi @velsa,

Many thanks for merging and refactoring the PR, as well as for the amazing software.

I've updated my site to use the latest version and can confirm that custom metadata is still reported correctly when visiting the site with Googlebot and Bingbot user agents (i.e., Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) [Chrome/W.X.Y.Z](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#user_agent_version) Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/).

Apparently, https://velsa.net/ is reporting a 404 when using one of the agents. Have you toggled the "Discoverable on the web" option off by any chance?

Finally, a non-important detail, but given Notion's update mentioned in https://github.com/velsa/notehost/issues/36#issuecomment-2195560337, I don't think that lines 98 and 100 bellow are necessary anymore: https://github.com/velsa/notehost/blob/811088548ab07daf71e78e507fcfc02c03fa0d9b/src/rewriters/meta-rewriter.ts#L96-L101

Line 99 is still required for the title, and if you ever decide to change the code to not replace og:site_name unconditionally, you may also want to replace ' on Notion'.