jekyll / jekyll

:globe_with_meridians: Jekyll is a blog-aware static site generator in Ruby
https://jekyllrb.com
MIT License
48.95k stars 9.95k forks source link

Encoding html entities? #2026

Closed mhulse closed 10 years ago

mhulse commented 10 years ago

Here's the scenario:

<meta property="og:title" content="{{ page.title }} | {{ site.title }}">

Using the above code, I'll run into problems if the title has double quotes.

The fix I've been using is to:

 <meta property="og:title" content="{{ page.title | markdownify | strip_html | trim }} | {{ site.title }}">

The above works, but it just seems a little verbose.

I tried the liquid escape_once filter with no results.

I've been tempted to use the XML escape filters, but this isn't XML.

How do the pros handle this?

Is there any way to automatically have all of my titles converted to entities without me having to do anything on the template level?

Thanks!

doktorbro commented 10 years ago

How do the pros handle this?

The pros never put typewriter double quotes in titles. If you write in American English, use curly quotation marks (“”).

The second option is to create a liquid include that contains your filter procedures.

{% include title.liquid text=page.title %}
mhulse commented 10 years ago

The pros never put typewriter double quotes in titles. If you write in American English, use curly quotation marks (“”).

Interesting! Thanks for tip. :+1:

I guess the only drawback is that the curly quotes are not easy to access. But, for titles and decks, it's not like I'm writing a story, so remembering to use curly quotes I guess isn't so bad.

This seems like the simplest option, especially when compared to adding filters to anywhere that I output titles/decks and other front matter.

{% include title.liquid text=page.title %}

Ah, that would work! Thanks for tip. :smile:

Question: Is there an advantage to use a .liquid extension vs. other extensions (like .html)?

that contains your filter procedures.

Just to clarify, is using markdownify | strip_html the best way to get html entities (I have smart quotes turned on)? Am I overlooking a filter that would do this without having to call markdownify?

doktorbro commented 10 years ago

I guess the only drawback is that the curly quotes are not easy to access.

If you use a mac or a linux distribution curly quotes are a simple key combination. Check out http://www.iawriter.com/ for all Apple devices.

The only advantage to use a liquid extension is your text editor recognizes the format. Jekyll don’t do any magic based on include’s file extension.

Smart quotes is the feature of your markdown processor. So yes, you must at least markdownify your text to convert straight quotes to curly ones.

parkr commented 10 years ago

Seems like we could add a filter here, anyway: escape_double_quotes.

mhulse commented 10 years ago

Not sure why I did not try this before, but this:

require 'cgi'

# {{ page.title | encode }}
module Entities

  def encode(input)
    CGI.escapeHTML(input)
  end

  def decode(input)
    CGI.unescapeHTML(input)
  end

  Liquid::Template.register_filter self

end

... will convert:

The "European" Marge Simpson

... to:

The "European" Marge Simpson

Whereas this:

The “European” Marge Simpson

(using smart quotes) will pass through unaltered.

I suppose that's not a bad solution above.

Also, now that I've experimented with smart quotes, using OPTION + [ to get and OPTION + SHIFT + [ to get is actually pretty easy ... and the end quote looks better (stating the obvious here).

Seems like we could add a filter here, anyway: escape_double_quotes.

That might be useful for peeps that can't use plugins. On one hand, using smart quotes is not a bad solution, but there might be cases where one would want to actually use entities.

For now, I think I'm sold on smart quotes (thanks for tip @penibelst).

If you use a mac or a linux distribution curly quotes are a simple key combination. Check out http://www.iawriter.com/ for all Apple devices.

Nice! Looks cool!

I've currently use Mou for my markdown writing. Unfortunately, it doesn't do GHFM (so, no named fenced code blocks to name one).

To get iOS support, I've used Elements, which stores MD files on DropBox. Then I can use Mou to edit. It's not the greatest flow though, as Mou, again, doesn't use GHFM.

Recently, I've been testing Marked, to preview my MD as I edit it inside of ST3 (using dual monitors); Marked supports something close to GHFM out of the box, but it's kinda a strange workflow to use two diff apps. That's why I like Mou (in-app split screen is nice).

I'd love to find an in-app split screen MD editor that supports all of the features of GHFM. Actually, it would be super bad ass to find something that supported all the markdown editors that Jekyll can use. Icing on the cake would be multi-device support and cloud storage (or, DropBox at bare minimum).

Lol, with all that said, thanks for tip on IA Writer! I'm gonna buy/play with that today. :+1:

The only advantage to use a liquid extension is your text editor recognizes the format. Jekyll don’t do any magic based on include’s file extension.

Ahhhh, that makes sense. What editor do you use? For me, I use ST3. I've installed a Jekyll package recently, though I'm not sure if it adds anything in terms of syntax highlighting and such. I use .html (and .md for posts) extensions for everything, and my YAML front matter is usually not colorized very well. I'd love to improve upon that.

Now I'm wondering how I could improve my ST3 Jekyll experience ... I'll have to try that .liquid extension to see if it helps.

Smart quotes is the feature of your markdown processor. So yes, you must at least markdownify your text to convert straight quotes to curly ones.

Interesting. The CGI.escapeHTML() works OK for straight conversions. IT would be cool to figure out a way to get encoded smart quotes automatically (vs. just straight conversion).

Maybe HTMLEntities would do that?

I suppose it would be easy enough to write a filter plugin that would find/replace straight quotes for curly.

Hrmmm, the more I think about it, typing smart quotes when I write the headlines/decks (in the front matter) is probably the way to go. :laughing:

For body text, I have turned out smart quotes, so I don't need to worry about it when writing longer posts.

doktorbro commented 10 years ago

Html allows to use double quotes and single quotes.

<meta property="og:title" content="{{ page.title }}">
<!-- is equal with -->
<meta property='og:title' content='{{ page.title }}'>

@parkr Do you want to catch all the possible combinations of bad typography inside the title?

# wrong single quote
title: "I don't know about bad typography"
# wrong double quotes
title: 'I know about "bad" typography'

If people use markdown in Jekyll, they forget that only the {{ content }} variable gets converted by default. Everything else must be markdownified by filters. This is how it works. Maybe the documentations should stress this fact.

Escaping would misdirect writers to bad typography. It’s a pitfall.

mhulse commented 10 years ago

Html allows to use double quotes and single quotes.

Good point. I think I'm one of those oddballs that always likes to use double quotes on the outside for HTML. Old habits die hard I guess.

I'm pretty sure in HTML5, the quotes are optional, so that's another option for folks. Again though, I prefer to quote my attrs using doubles (I'm oldschool like that I suppose) :laughing:

Good tips though! Using singles on the outside would save me from some headache if I forgot to type a smart quote.

If people use markdown in Jekyll, they forget that only the {{ content }} variable gets converted by default. Everything else must be markdownified by filters. This is how it works. Maybe the documentations should stress this fact.

That's definitely a good point to stress. I mean, I didn't notice until I viewed my source (though, I had suspected this might happen based upon my experience using other systems).

Escaping would misdirect writers to bad typography. It’s a pitfall.

So, to clarify, your suggesting one uses typographers quotes? Like I said, I'm thinking I'll head that route now that you've kicked me in the right direction here.

parkr commented 10 years ago

@mhulse xml_escape is definitely what you wanted, then. Guess it's just poorly named, haha.

Closing out.

doktorbro commented 10 years ago

Welcome in the world of typography. Straight quotes were never the part of any living language. Open a book, you will never find a straight quote. Yes, please always use typographers quotes and typographers apostrophes. The Register-Guard is a great example, how to use them right.

mhulse commented 10 years ago

Thanks guys!

@mhulse xml_escape is definitely what you wanted, then.

The "xml" in the name threw me off! :)

Would love to see an example on the page:

screen shot 2014-02-09 at 3 48 20 pm

... it's the only "escape" filter on that page that doesn't have an example.

Welcome in the world of typography. Straight quotes were never the part of any living language. Open a book, you will never find a straight quote. Yes, please always use typographers quotes and typographers apostrophes.

Will do! Thanks so much for the pro advice!

The Register-Guard is a great example, how to use them right.

It's funny, all of my newspaper colleagues use smart quotes. I've always prefered straight as I think it comes from my coding background (I'm always using single/double straight quotes for coding). It doesn't feel natural to use the curly versions. :laughing:

At the RG, we use Adobe InCopy/InDesign, and the option to use typographers quotes is turned on by default. I don't do any writing, but Adobe makes it easy for folks. :+1:

Anyway, thanks again for all the help folks! It's much appreciated!

adammichaelwood commented 9 years ago

Is there a way to make the markdown processor convert quptes and apostrophes to the html entities:

&lsquo;
&rdquo;
etc.

I'd prefer my output text be that, rather than utf-8 encoded curly quotes, because of reasons. I wouldn't even mind doing it my self before rendering (I have a little bash script to sed replace quptes with the html entities. But then the processor converts them BACK into single-chracter curly quotes. Very annoying.

kleinfreund commented 9 years ago

because of reasons.

Why? There is really no need to do that unless you’re using & inside HTML attributes (encode it as &amp; then).

doktorbro commented 9 years ago

@adammichaelwood You can replace the quotes in the HTML layout.

index.md

---
layout: default
---

Adam said: "I'd prefer my output text be that, because of reasons."

_layouts/default.html

{{ content | replace: "’", "&#8217;" | replace: "“", "&#8220;" | replace: "”", "&#8221;" }}

_site/index.html

<p>Adam said: &#8220;I&#8217;d prefer my output text be that, because of reasons.&#8221;</p>

There is no markdown processor working on HTML layouts.

adammichaelwood commented 9 years ago

Thanks.

Why would I want to do that? Because content shows up in a lot of places other than the one place I put it with a proper UTF-8 declaration. Feeds, Twitters, emails, etc, etc. I see jumbled characters on things ALL THE TIME and it seems to me the easiest way to avoid it is to use the HTML ampersand codes which work pretty much everywhere without any problems.

Thanks for the info on doing the replacement inside the layout.

I was somewhat confused by:

There is no markdown processor working on HTML layouts.

I think my real question is - is there a way to "fix" the behavior of the markdown processor? That t converts straight quotes to curly (in UTF-8) is great and sensible. But there isn't an option to have it spit out html entities instead? And if I try to bypass that by putting html entities into my markdown content (which I don't WANT to do - it was just a thing I tried), it converts those also, instead of leaving them alone.

Well - anyway - I think I can solve my issue now. Thanks all. Sorry for cluttering up a gh issue thread with personal problems...

doktorbro commented 9 years ago

Which markdown processor are you using?

adammichaelwood commented 9 years ago

kramdown

doktorbro commented 9 years ago

@adammichaelwood Kramdown has an option for exactly what you need: entity_output. The default value is as_char. That’s why you get chars now. You must change it to numeric.

kramdown:
  entity_output: numeric

Read more about available options in the excellent documentation.

adammichaelwood commented 9 years ago

Hey that works! Cool thanks.

Another question, if no one minds:

Is there a strong reason NOT to prefer HTML entities for this sort of thing. My reason, as I said, was more universal compatibility, as I don't always know where my stuff will show up, and seeing goofy messed up UTF characters in feeds and so forth annoys me. But am I now in danger of the same problem - seeing &8221; in weird places I shouldn't?

I ask because I don't understand why html-entitity output isn't the default for the processor. (Robust principle and all.)

parkr commented 9 years ago

I don't understand why html-entitity output isn't the default for the processor.

We don't write any of the processors ourselves, so whichever one you're using, you'd have to bring this up with them. We have the escaping filters for Liquid in the case that < and > are possibly in the text (useful for making a given block of text a textnode within an XML element, for example).

adammichaelwood commented 9 years ago

Yes, sorry - that was more of a philosophical question, based on the reaction I got above to my desire to output html entities. @kleinfreund seemed to think it was a silly thing to want to do.

kleinfreund commented 9 years ago

I did not say it was particularly silly. However, I think characters like , “”, „“, , , etc. should work everywhere without having a weird output. The only place we see something weird while encoding them would be the code. You can do either way and should be fine whatever you choose. The important thing is setting utf-8 in your HTML.

adammichaelwood commented 9 years ago

Thanks. (I was probably projecting. My dev skills are sketchy, so I'm a little insecure.)

Anyway --- the answer above re: output options in kramdown was exactly the answer I needed. So thanks!