jekyll / jekyll-seo-tag

A Jekyll plugin to add metadata tags for search engines and social networks to better index and display your site's content.
https://jekyll.github.io/jekyll-seo-tag
MIT License
1.66k stars 294 forks source link

Account for double slash (//) from baseURL #200

Closed 0xdevalias closed 7 years ago

0xdevalias commented 7 years ago

To have my site work correctly on GitHub pages with a CNAME set, I need to set my baseUrl to /, otherwise it tries to use the full path (which then breaks because of my CNAME)

In doing so, all of my websites SEO links now contain //.

<!-- Begin Jekyll SEO tag v2.2.1 -->
<title>/dev/alias – Hack. Dev. Transcend. | Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more.</title>
<meta property="og:title" content="/dev/alias – Hack. Dev. Transcend." />
<meta property="og:locale" content="en_GB" />
<meta name="description" content="Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more." />
<meta property="og:description" content="Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more." />
<link rel="canonical" href="http://devalias.net//" />
<meta property="og:url" content="http://devalias.net//" />
<meta property="og:site_name" content="/dev/alias – Hack. Dev. Transcend." />
<link rel="next" href="http://devalias.net//2/">
<meta name="twitter:card" content="summary" />
<meta name="twitter:site" content="@_devalias" />
<script type="application/ld+json">
{"@context":"http://schema.org","@type":"WebSite","name":"Glenn &#39;devalias&#39; Grant","headline":"/dev/alias – Hack. Dev. Transcend.","description":"Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more.","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://devalias.net//assets/images/cover4.jpg"}},"sameAs":["https://twitter.com/_devalias","https://www.linkedin.com/in/glenn-devalias-grant/","https://github.com/alias1","https://keybase.io/devalias"],"url":"http://devalias.net//"}</script>
<!-- End Jekyll SEO tag -->

For my own links, I wrote a little filter that I add to the end called chomp_slash, that will strip the last / from a string (so I can manually add it with my path)

module Jekyll
  module ChompSlashFilter
    def chomp_slash(input)
      input.chomp('/')
    end
  end
end

Liquid::Template.register_filter(Jekyll::ChompSlashFilter)

Would it be possible to get similar functionality within Jekyll SEO so it creates my links properly?

pathawks commented 7 years ago

I don't understand; why do you need to set baseUrl to "/"?

That is very much not the design of baseUrl, so chances are high the root problem is elsewhere.

http://blog.parkermoore.de/2014/04/27/clearing-up-confusion-around-baseurl/ https://byparker.com/blog/2014/clearing-up-confusion-around-baseurl/

0xdevalias commented 7 years ago

Because when I deploy to GitHub pages, if I haven't set it, or I have set it to be empty, GH in all its wisdom takes over and sets it for me, to something akin to "/alias1/devalias.net/" or similar. Which is definitely not going to work when my site is hosted on the domain devalias.net.

pathawks commented 7 years ago

It sounds like your GitHub Pages site is not properly configured.

https://help.github.com/articles/adding-or-removing-a-custom-domain-for-your-github-pages-site/

0xdevalias commented 7 years ago

My page is properly set up, and accessible, and when I use a baseURL of "/" everything works fine.

image

Due to https://github.com/github/pages-gem/issues/359 or maybe https://github.com/jekyll/github-metadata/issues/87 or something similar, when the baseUrl is '/' GH takes over. I'll intentionally re-break it so I can get you the exact copy/paste:

I have also got this setup, which doesn't prevent it:

# _config.yml
baseurl: ""
Executing 'touch .nojekyll'...
                    done in 0.006 seconds.
Executing 'git add -A'...
                    done in 0.42 seconds.
Executing 'git commit -m "Update site: `date`"'...
                    done in 1.024 seconds.
Executing 'git push origin gh-pages'...
Counting objects: 1278, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (1145/1145), done.
Writing objects: 100% (1278/1278), 326.91 KiB | 0 bytes/s, done.
Total 1278 (delta 638), reused 0 (delta 0)
remote: Resolving deltas: 100% (638/638), completed with 230 local objects.
To github.com:alias1/devalias.net.git
   fbc7359..c5a2a46  gh-pages -> gh-pages
                    done in 8.023 seconds.
<!-- Begin Jekyll SEO tag v2.2.1 -->
<title>/dev/alias – Hack. Dev. Transcend. | Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more.</title>
<meta property="og:title" content="/dev/alias – Hack. Dev. Transcend." />
<meta property="og:locale" content="en_GB" />
<meta name="description" content="Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more." />
<meta property="og:description" content="Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more." />
<link rel="canonical" href="http://devalias.net/pages/alias1/devalias.net/" />
<meta property="og:url" content="http://devalias.net/pages/alias1/devalias.net/" />
<meta property="og:site_name" content="/dev/alias – Hack. Dev. Transcend." />
<link rel="next" href="http://devalias.net/pages/alias1/devalias.net/2/">
<meta name="twitter:card" content="summary" />
<meta name="twitter:site" content="@_devalias" />
<script type="application/ld+json">
{"@context":"http://schema.org","@type":"WebSite","name":"Glenn &#39;devalias&#39; Grant","headline":"/dev/alias – Hack. Dev. Transcend.","description":"Follow me into the rabbit hole that is my mind and learn about topics including.. security, technology, efficiency, biohacking, health, personal growth and probably a whole lot more.","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://devalias.net/pages/alias1/devalias.net/assets/images/cover4.jpg"}},"sameAs":["https://twitter.com/_devalias","https://www.linkedin.com/in/glenn-devalias-grant/","https://github.com/alias1","https://keybase.io/devalias"],"url":"http://devalias.net/pages/alias1/devalias.net/"}</script>
<!-- End Jekyll SEO tag -->

image

When I set baseUrl explicitly back to /, everything works fine again (as it has been for months if not years now), except for the // being an issue, and since it's coming from this plugin not my own code, not one that I can personally correct.

A link in https://github.com/github/pages-gem/issues/359#issuecomment-263375327 leads to this snippet of code, showing that it will take over when baseUrl is blank.

pathawks commented 7 years ago

You should open this issue over at pages-gem. This definitely sounds like a real problem.

0xdevalias commented 7 years ago

That issue was in the pages-gem, i've commented explicitly there as well. According to articles like the following, using the / in the baseurl is 'bad form':

Jekyll 3.4 added support to prevent certain double slash cases, which implies it's a common issue:

Jekyll now prevents double forward slash errors. In this case Jekyll will not append a forward slash to url: because the baseurl: input already contains ‘/‘.

url: "http://example.com"
baseurl: "/blog"

It seems like it wouldn't be much of a change required to ensure that this never is an issue in this plugin. I'm not familiar with the code base, but a regex to replace any repeated slashes with a single slash at URL concatenations would solve it.

It looks like 3.3 brought new filters that might address the 'double slash' problem if used, haven't really looked too deeply, and they still seem to think that setting baseurl to / is a bad idea:

It seems like such a small problem to persist so long and cause so many issues :(