innoq / statuses

statuses
Apache License 2.0
13 stars 14 forks source link

HTML escaping in atom feed #110

Open pschirmacher opened 9 years ago

pschirmacher commented 9 years ago

This looks weird:

@<a href='/statuses/updates?author=st'& gt;st</a> /me is in love the simple things! But there's an issue with placing the cursor in the input field on iOS - the text doesn't scroll!

https://github.com/innoq/statuses/blob/master/src/statuses/views/atom.clj#L26 ?

aheusingfeld commented 9 years ago

As written in https://github.com/innoq/naveed/issues/13#issuecomment-63609225, AFAIR an atom feed is supposed to contain xml entities. In this case feedworker should behave like a RSS reader and decode them.

pschirmacher commented 9 years ago

Good point.

Statuses sends content with type HTML: <content type="html">@&lt;a href=&apos;/statuses/updates?author=st&apos;&gt;st&lt;/a&gt; <U+1F60A> /me is in love the simple things! But there&amp;apos;s an issue

For HTML escaping, it uses this function: https://github.com/weavejester/hiccup/blob/master/src/hiccup/util.clj#L55 which encodes ' as &apos;. Apparently, &apos; is not defined in HTML 4 and e.g. commons-lang3 does not unescape it. Not sure if this warrants any change in statuses, just wanted to mention it.

I'll adapt the feed processor accordingly.

mvitz commented 9 years ago

Have we agreed that escaping is the right thing to do? If yes I will close this issue as won't fix.

pschirmacher commented 9 years ago

IMHO escaping is the right thing to do. The only question is whether or not to encode ' as &apos;.

mvitz commented 9 years ago

Text escaping in XML is defined as: http://www.w3.org/TR/xml/#syntax

aheusingfeld commented 9 years ago

The only question is whether or not to encode ' as '.

FWIW I noticed that these chars are also encoded in the HTML the app returns! :(

mvitz commented 9 years ago

It seems in HTML5 escaping ' and " is allowed (see: http://www.w3.org/International/questions/qa-escapes and http://www.tutorialspoint.com/html5/html5_entities.htm). However they give the hint that &apos; is not supported in HTML4 and older browsers. Maybe we just escape these as &#39;?