silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
721 stars 821 forks source link

[2013-02-26] Convert methods not doing their job #1690

Closed silverstripe-issues closed 7 years ago

silverstripe-issues commented 11 years ago

created by: @xini (florian.thoma) created at: 2013-02-26 original ticket: http://open.silverstripe.org/ticket/8286


Convert::raw2att() and Convert::raw2htmlatt() both fall back to Convert::raw2xml(). This is not correct since tags are allowed in XML but not in attributes.

Convert::raw2xml() on the other hand does not replace newlines with
which it should.

tractorcow commented 11 years ago

<br /> isn't valid XML either, but it is valid HTML :) How about a raw2html?

simonwelsh commented 11 years ago

<br /> is valid XML, <br> isn't.

The change to Convert::raw2xml() causes a regression from 2.4 in the way a Text with newlines is outputted.

tractorcow commented 11 years ago

I stand corrected.

(edit: I've clarified the point I was trying to make in my response below; I still could be wrong, but I hope it's explained better. ^^)

willmorgan commented 11 years ago

FWIW I tested the three functions in HTML, XML, and attributes, and couldn't see any problem. All tags are escaped as I would expect them to be.

Am I being dense though?

xini commented 11 years ago

As I tried to explain initially there are two problems with this class.

  1. raw2att() and raw2htmlatt() both fall back to raw2xml(). If you have a text with xml tags (i.e. 'this is bold') and insert it as a HTML attribute converted with the above methods you end up with tags inside the attribute:
<p class="this is <b>bold</b>">text</p>

I believe this is why in SS3 they introduced the new method raw2htmlname() which again should do the same as raw2att() and raw2htmlatt(). All three methods (raw2att, raw2htmlatt and raw2htmlname) should do the same and should remove all tags as well as new lines. You could even argue that they should replace spaces by hyphens/underlines to get correct HTML attributes.

  1. raw2xml() on the other hand should not only do the htmlspecialchars() but also replace new lines by
    -tags. Imagine a text with several lines (i.e. from a text area) that you want to be displayed on a website. There the new lines would have to be converted to HTML.
tractorcow commented 11 years ago

Isn't this how it should work?:

input: here is <b>some</b> text\n& "here is another line"

I think the above distinguishes each type:

tractorcow commented 11 years ago

I think my complaint is that raw2html doesn't actually exist, and although raw2xml is being used instead, it would benefit to have raw2html to properly generate <br /> when newlines are encountered.

xini commented 11 years ago

Well, the behaviour you described might me the "correct" behaviour, but I don't see a usecase for it. Who wants a string like here is &lt;b&gt;some&lt;/b&gt; text\n&amp; &quot;here is another line&quot; in i.e. an alt attribute of an image? I would rather have it cleaned up. And what's the raw2att for? And for your issue, yes raw2html doesn't exist. But why was raw2xml changed to not replacing the new lines? If it would work as it did in ss2.4 raw2html would not be needed. In general, I think there are just too many methods that do the same and therefore are not needed. Am I wrong?

tractorcow commented 11 years ago

It was just a hypothetical example.

A real life example is someone typing the above string explicitly into a user-contact form. We don't want raw HTML being rendered in the email sent to the admin (as it's a scripting vulnerability, potential to embed nefarious scripts, etc), but we may want the admins to see what was typed as visually identical to the input. That's a pretty valid use case if ever there was one.

Another situation is where user content can be, for whatever reason, encoded into XML, transported via HTTP, and decoded on another server (such as a webservice might behave). We don't want literal
in text nodes there, because it will need to be decoded again into the original format.

I think the correct thing to do is to make sure the code behaves properly as per standards (and I'm talking about proper iso/w3 standards), and let the users determine what tools are best to use.

Addressing your specific questions, and as a disclaimer these are all off the top of my head and I could be entirely wrong:

Edit: Yes, I just noticed raw2xml also escapes quotes. It doesn't HAVE to, but it does. This allows raw2att to call it directly without additional escaping. As noted above, however, that doesn't mean that the function will still behave the same in the future.

hafriedlander commented 11 years ago

A couple of naming clarifications: Convert is perhaps a bad name for this class - it should possibly be Escape, and "raw" doesn't mean "text, where newlines are important" - it means "no idea"

The purpose of Convert::raw2xml is to make something insertable into an XML document (at the element scope) in a safe manner - it's not supposed to turn text into xml, which is why it doesn't convert newlines into <br /> tags.

There are probably too many functions in Convert. One reason is because they don't have well defined behaviour except their implementation, so changing that implementation will probably cause regressions (including possible security regressions).

Additionally however none of the Convert functions are context or structure sensitive (on purpose) - they don't know what you're converting, or where you're putting the result, which is why there's different converters for XML element scope, attribute name scope and attribute value scope.

xini commented 11 years ago

@tractorcow:

@hafriedlander:

tractorcow commented 11 years ago

Correcty me if I'm wrong, but if you need a literal <br /> in an XML string, the closest you could get is <div><![CDATA[Text with a <br />a newline]]></div> right?

I'll let Hamish further this discussion, as I think I'm probably going to end up repeating myself!

Hope I could be of help Xini.

xini commented 11 years ago

That's the commit that changed the methods to do the same: https://github.com/silverstripe/sapphire/commit/c7a98407b1f8b46224c3615edfbfae4d29ffb4e0

xini commented 11 years ago

@tractorcow ah, yes within a XML text node you would of course have to cdata the br tag. But the string itself is still valid XML.

hafriedlander commented 11 years ago

If you need the literal characters <br /> in XML, you shouldn't be using Convert::raw2xml at all, maybe?

I'm going on holiday soon, so can't really focus on this sorry, but the new Convert methods make sense to me. I think it's a documentation / expectation / naming issue, not a bug in the Convert methods themselves.

tractorcow commented 11 years ago

Hope it's somewhere warm :)

tractorcow commented 11 years ago

I think, overall, the Convert class could be better defined, it's currently working as intended. I agree that there could be the potential to add a method that generates newlines for HTML type output, but that's outside the scope of this issue.

simonwelsh commented 11 years ago

There is a regression here caused by the change to raw2xml()

2.4: DBField::create('Text', "Some string\nwith newline")->forTemplate() -> "Some string<br />\nwith newline"

3.0: DBField::create_field('Text', "Some string\nwith newline")->forTemplate() -> "Some string\nwith newline"

Which means you have to pre-process a Text field before you can send it to the template if you want the newlines to be outputted. Correcting that is entirely within the scope of the latter half of this issue.

Also, <div>Text with a <br />a newline</div> is perfectly valid XML. CDATA is used to prevent the <br /> from becoming a node with two string node siblings.

tractorcow commented 11 years ago

Yeah, you're still right. :) I need to do better research.

hafriedlander commented 11 years ago

@simonwelsh - I don't think that's an error. It's a change in behaviour compared to 2.4, but to me the 3.0 behaviour is better. You can't assume that whitespace is important in the value passed to raw2xml.

simonwelsh commented 11 years ago

@hafriedlander But it is a fairly major change to the way Text fields are handled. I've split that out into a separate issue

xini commented 11 years ago

Whether the behavior is better in 3 than 2.4 I don't know. But fact is that three methods do exactly the same which doesn't really make sense. And for html attributes whitespace is important, or not? And if you pass something to raw2xml I would expect the value to be transformed to xml, including the br tags. If you want to use the value in a xml text node, I would expect something like a raw4xmltext method doing that. Am I wrong?

tractorcow commented 7 years ago

I'm going to close this discussion as I think it's run it's course with no agreed change.

Agreed behaviour is that conversion of newlines to
tag is not the role of raw2xml. Prior discussed regressions seem to have been resolved with https://github.com/silverstripe/silverstripe-framework/issues/1942.

StringField perform nl2br in forTemplate(), which is the accepted method of rendering non-html text within a html template.

Original complaint about att() encode methods were incorrect; Any content that's escaped adequately is acceptable in a property. :)