Closed silverstripe-issues closed 7 years ago
<br /> isn't valid XML either, but it is valid HTML :) How about a raw2html?
<br /> is valid XML, <br> isn't.
The change to Convert::raw2xml() causes a regression from 2.4 in the way a Text with newlines is outputted.
I stand corrected.
(edit: I've clarified the point I was trying to make in my response below; I still could be wrong, but I hope it's explained better. ^^)
FWIW I tested the three functions in HTML, XML, and attributes, and couldn't see any problem. All tags are escaped as I would expect them to be.
Am I being dense though?
As I tried to explain initially there are two problems with this class.
<p class="this is <b>bold</b>">text</p>
I believe this is why in SS3 they introduced the new method raw2htmlname() which again should do the same as raw2att() and raw2htmlatt(). All three methods (raw2att, raw2htmlatt and raw2htmlname) should do the same and should remove all tags as well as new lines. You could even argue that they should replace spaces by hyphens/underlines to get correct HTML attributes.
Isn't this how it should work?:
input: here is <b>some</b> text\n& "here is another line"
here is <b>some</b> text\n& "here is another line"
here is <b>some</b> text<br />& "here is another line"
here is <b>some</b> text\n& "here is another line"
hereisbsomebtexthereisanotherline
I think the above distinguishes each type:
I think my complaint is that raw2html doesn't actually exist, and although raw2xml is being used instead, it would benefit to have raw2html to properly generate <br />
when newlines are encountered.
Well, the behaviour you described might me the "correct" behaviour, but I don't see a usecase for it. Who wants a string like here is <b>some</b> text\n& "here is another line"
in i.e. an alt attribute of an image? I would rather have it cleaned up.
And what's the raw2att for?
And for your issue, yes raw2html doesn't exist. But why was raw2xml changed to not replacing the new lines? If it would work as it did in ss2.4 raw2html would not be needed.
In general, I think there are just too many methods that do the same and therefore are not needed. Am I wrong?
It was just a hypothetical example.
A real life example is someone typing the above string explicitly into a user-contact form. We don't want raw HTML being rendered in the email sent to the admin (as it's a scripting vulnerability, potential to embed nefarious scripts, etc), but we may want the admins to see what was typed as visually identical to the input. That's a pretty valid use case if ever there was one.
Another situation is where user content can be, for whatever reason, encoded into XML, transported via HTTP, and decoded on another server (such as a webservice might behave). We don't want literal
in text nodes there, because it will need to be decoded again into the original format.
I think the correct
thing to do is to make sure the code behaves properly as per standards (and I'm talking about proper iso/w3 standards), and let the users determine what tools are best to use.
Addressing your specific questions, and as a disclaimer these are all off the top of my head and I could be entirely wrong:
<div>Text with<br />a newline</div>
isn't a valid xml block, although it's a valid html one. The newline character \n
is valid in XML, however, so that's what should be used. Just because it did otherwise in 2.4 doesn't mean it's correct.Edit: Yes, I just noticed raw2xml also escapes quotes. It doesn't HAVE to, but it does. This allows raw2att to call it directly without additional escaping. As noted above, however, that doesn't mean that the function will still behave the same in the future.
A couple of naming clarifications: Convert
is perhaps a bad name for this class - it should possibly be Escape
, and "raw" doesn't mean "text, where newlines are important" - it means "no idea"
The purpose of Convert::raw2xml
is to make something insertable into an XML document (at the element scope) in a safe manner - it's not supposed to turn text into xml, which is why it doesn't convert newlines into <br />
tags.
There are probably too many functions in Convert
. One reason is because they don't have well defined behaviour except their implementation, so changing that implementation will probably cause regressions (including possible security regressions).
Additionally however none of the Convert
functions are context or structure sensitive (on purpose) - they don't know what you're converting, or where you're putting the result, which is why there's different converters for XML element scope, attribute name scope and attribute value scope.
@tractorcow:
@hafriedlander:
Correcty me if I'm wrong, but if you need a literal <br />
in an XML string, the closest you could get is <div><![CDATA[Text with a <br />a newline]]></div>
right?
I'll let Hamish further this discussion, as I think I'm probably going to end up repeating myself!
Hope I could be of help Xini.
That's the commit that changed the methods to do the same: https://github.com/silverstripe/sapphire/commit/c7a98407b1f8b46224c3615edfbfae4d29ffb4e0
@tractorcow ah, yes within a XML text node you would of course have to cdata the br tag. But the string itself is still valid XML.
If you need the literal characters <br />
in XML, you shouldn't be using Convert::raw2xml at all, maybe?
I'm going on holiday soon, so can't really focus on this sorry, but the new Convert methods make sense to me. I think it's a documentation / expectation / naming issue, not a bug in the Convert methods themselves.
Hope it's somewhere warm :)
I think, overall, the Convert class could be better defined, it's currently working as intended. I agree that there could be the potential to add a method that generates newlines for HTML type output, but that's outside the scope of this issue.
There is a regression here caused by the change to raw2xml()
2.4: DBField::create('Text', "Some string\nwith newline")->forTemplate() -> "Some string<br />\nwith newline"
3.0: DBField::create_field('Text', "Some string\nwith newline")->forTemplate() -> "Some string\nwith newline"
Which means you have to pre-process a Text field before you can send it to the template if you want the newlines to be outputted. Correcting that is entirely within the scope of the latter half of this issue.
Also, <div>Text with a <br />a newline</div>
is perfectly valid XML. CDATA is used to prevent the <br /> from becoming a node with two string node siblings.
Yeah, you're still right. :) I need to do better research.
@simonwelsh - I don't think that's an error. It's a change in behaviour compared to 2.4, but to me the 3.0 behaviour is better. You can't assume that whitespace is important in the value passed to raw2xml.
@hafriedlander But it is a fairly major change to the way Text fields are handled. I've split that out into a separate issue
Whether the behavior is better in 3 than 2.4 I don't know. But fact is that three methods do exactly the same which doesn't really make sense. And for html attributes whitespace is important, or not? And if you pass something to raw2xml I would expect the value to be transformed to xml, including the br tags. If you want to use the value in a xml text node, I would expect something like a raw4xmltext method doing that. Am I wrong?
I'm going to close this discussion as I think it's run it's course with no agreed change.
Agreed behaviour is that conversion of newlines to
tag is not the role of raw2xml. Prior discussed regressions seem to have been resolved with https://github.com/silverstripe/silverstripe-framework/issues/1942.
StringField perform nl2br in forTemplate(), which is the accepted method of rendering non-html text within a html template.
Original complaint about att() encode methods were incorrect; Any content that's escaped adequately is acceptable in a property. :)
created by: @xini (florian.thoma) created at: 2013-02-26 original ticket: http://open.silverstripe.org/ticket/8286
Convert::raw2att() and Convert::raw2htmlatt() both fall back to Convert::raw2xml(). This is not correct since tags are allowed in XML but not in attributes.
Convert::raw2xml() on the other hand does not replace newlines with
which it should.