prismicio-community / php-kit

Community maintained development kit for Prismic and the PHP language
https://prismic.io
Other
108 stars 82 forks source link

Allow HTML serializer to also work on spans within blocks #106

Closed tremby closed 9 years ago

tremby commented 9 years ago

The link resolver is now passed to the HTML serializer, as the third parameter. This allows HTML serializers to be a little more flexible and reusable.

The type of the HTML serializer is now explicitly described as "callable", since this shows that it can be not only a lambda but also any other kind of callable like a string function name, an array of class name and method or object and method or an object which implements the __invoke method.

The HTML serializer is now passed to the method which turns spans within blocks into HTML as well as the method which turns blocks into HTML. This allows finer control.

However, since internally blocks and spans are currently turned into HTML quite differently (blocks with string manipulation and spans with the DOM) an implementation of HTML serializer needs to handle each type differently. In particular, it must return a string of HTML if the $element passed to it is an instance of BlockInterface and a DOMElement if $element is an instance of SpanInterface. As before, it can also return null to skip this element and defer to the built-in serialization routines.

Some documentation has been added to this effect.

tremby commented 9 years ago

Obviously it's not ideal that the HTML serializer needs to work in two different ways. One solution would be to change over to using DOM across the board to build HTML. If that sounds good I may find some time to implement it.

It would additionally be useful to expose the built-in serializer (perhaps as its own class with an __invoke method) so a custom serializer can call the built-in one and then modify its output, rather than only having the choice to serialize the HTML from scratch or use the built-in one as is, as is currently the case.

tremby commented 9 years ago

Any word?

erwan commented 9 years ago

Hi Bart,

Yes, we need to have the HTMLSerializer work with spans too. That's how it works in other kits. I've recently made a few bug fixes to the way serialization works in the PHP kit, so I need to adapt the fix to have spans support in the HTML Serializer.

erwan commented 9 years ago

That's fixed in 1.5.8 - thank you for your help!

tremby commented 9 years ago

I guess I didn't stress it before, but one of the important things in this PR was that the link resolver was passed to the HTML serializer. I just made a new PR for that, #108.

I'm surprised you moved away from using the DOM, since you're potentially introducing non-escaping errors. For instance if a special character eg < or " is in a label, it won't be escaped for the class attribute and could cause problems. (Or an ampersand, though most browsers will handle these ok.) The DOM code wasn't exactly friendly, but should have been pretty rock-solid -- less rickety than string manipulation, at least.

erwan commented 9 years ago

Hi,

The DOM code wasn't really rock-solid in the sense that generating the HTML for StructuredText (in particular the spans from their position) is tricky if you consider they can be nested. While the DOM gives the guarantee that the generated HTML is valid, it doesn't guarantee that it is functionally correct, e.g. doesn't skip a span.

Overall as you can see in the code the algorithm is a bit complex, and it's easier to maintain if we have roughly the same algorithm in all languages.

tremby commented 9 years ago

Can you show me an example of where it didn't work? Is there a new test case which the old code failed, for example? I'd like to see where I went wrong. I thought I covered the case of nested spans. On 20 May 2015 00:46, "Erwan Loisant" notifications@github.com wrote:

Hi,

The DOM code wasn't really rock-solid in the sense that generating the HTML for StructuredText (in particular the spans from their position) is tricky if you consider they can be nested. While the DOM gives the guarantee that the generated HTML is valid, it doesn't guarantee that it is functionally correct, e.g. doesn't skip a span.

Overall as you can see in the code the algorithm is a bit complex, and it's easier to maintain if we have roughly the same algorithm in all languages.

— Reply to this email directly or view it on GitHub https://github.com/prismicio/php-kit/pull/106#issuecomment-103798583.

erwan commented 9 years ago

We had cases where the spans positions got out of sync, depending on how your spans were organized..

erwan commented 9 years ago

It wasn't your pull request, but the code we had in our repository that was using DOM.

tremby commented 9 years ago

Yes, I wrote that code, or at least the original version of it. Can you give an example case? On 20 May 2015 01:07, "Erwan Loisant" notifications@github.com wrote:

It wasn't your pull request, but the code we had in our repository that was using DOM.

— Reply to this email directly or view it on GitHub https://github.com/prismicio/php-kit/pull/106#issuecomment-103803409.

erwan commented 9 years ago

I believe it was on this content: http://pastebin.com/vyg2ixhs

tremby commented 9 years ago

Thanks; I'll post-mortem my code tomorrow. On 20 May 2015 01:20, "Erwan Loisant" notifications@github.com wrote:

I believe it was on this content: http://pastebin.com/vyg2ixhs

— Reply to this email directly or view it on GitHub https://github.com/prismicio/php-kit/pull/106#issuecomment-103805541.

tremby commented 9 years ago

Actually, you broke it in https://github.com/prismicio/php-kit/commit/1e52533d880f705a3ba06c0a26e8c03b1629b41d -- spans lined up perfectly before that commit, in the JSON you linked, and not after.

In that commit you reused some of my DOM code but made one subtle error: you changed a call to htmlspecialchars (old line https://github.com/prismicio/php-kit/commit/1e52533d880f705a3ba06c0a26e8c03b1629b41d#diff-afb4af17c86cb37f05bd052b76265f67L345) to htmlentities (new line https://github.com/prismicio/php-kit/commit/1e52533d880f705a3ba06c0a26e8c03b1629b41d#diff-afb4af17c86cb37f05bd052b76265f67R317).

When you escape more than the bare minimum (htmlspecialchars escapes much less by default than htmlentities) and pass into createElement, it automatically makes not just one textNode inside as the rest of my code was expecting, but also an entity reference node for each entity. My code didn't support looking through these. I should have clarified that in the original code or, better, explicitly made and inserted the text nodes (since doing it that way the escaping is automatic).

erwan commented 9 years ago

Thank you for finding where the bug came from!