strictdoc-project / strictdoc

Software for technical documentation and requirements management.
https://strictdoc.readthedocs.io/en/stable/
Other
151 stars 24 forks source link

Use Jinja2 autoescaping #1921

Closed haxtibal closed 2 months ago

haxtibal commented 2 months ago

This enables Jinja2 autoescaping everywhere, except for the diff screen which relies on pre generated HTML. The MR is not meant to be merged as-is.

It fixes the issues observed in #1920. Most integration tests pass. Tests fail for UI forms, presumably because they call html.escape manually and now are escaped twice.

Jinja2 recommneds autoescaping:

In future versions of Jinja we might enable autoescaping by default for security reasons. As such you are encouraged to explicitly configure autoescaping now instead of relying on the default.

Templates access way more functions and variables than just enumerate_all_fields(). All need to be escaped. This does it automatically without risk of forgetting some places.

This MR changes some formerly explicit escaping:

Considerations about what to escape

TODO:

haxtibal commented 2 months ago

Integration tests fail because the recently released filecheck 1.0.0 doesn't support --dump-input. This is a regression unrelated to this PR.

stanislaw commented 2 months ago

Integration tests fail because the recently released filecheck 1.0.0 https://github.com/AntonLydike/filecheck/issues/19. This is a regression unrelated to this PR.

Sorry for the inconvenience.

We are in the process of transferring the filecheck project to a group of developers who want to actively maintain and develop their own version of filecheck instead of FileCheck.py that was previously developed by me. We were too quick to release a 1.0.0, and it doesn't have the dump-input implemented yet.

I have reported this here: https://github.com/AntonLydike/filecheck/issues/19#issuecomment-2241237118.

haxtibal commented 2 months ago

The latest commit "Mark all template.render() output as Markup" is to be discussed: We have to flag every programmatically generated HTML string as "safe", otherwise autoescaping would wrongly process them. This notably applies to all template.render() calls. I hoped Jinja2 would automatically mark the render output as safe, but seemingly not. So I tried to centralize Markup wrapping a bit. The result is a systematic but big diff. Would it be ok for you?

I asked Jinja2 folks whether such a feature could be implemented, see pallets/jinja#2003.

stanislaw commented 2 months ago

The latest commit "Mark all template.render() output as Markup" is to be discussed: We have to flag every programmatically generated HTML string as "safe", otherwise autoescaping would wrongly process them. This notably applies to all template.render() calls. I hoped Jinja2 would automatically mark the render output as safe, but seemingly not. So I tried to centralize Markup wrapping a bit. The result is a systematic but big diff. Would it be ok for you?

I like the idea of render_template_as_markup because it also reduces the boilerplate everywhere.

I commented two times, asking about the possible redundancy of Markup / autoescape=True. I would like you to confirm that both mechanisms have to be used by us (Markup and autospace=True).

I asked Jinja2 folks whether such a feature could be implemented, see pallets/jinja#2003.

I could not understand what you are referring to in your example there:

The Template knows inner is created with autoescape=True. Couldn't it mark the result of render automatically as safe?

How can it make the result of render automatically if the result is a simple str object?

>>> inner = Template('<b>{{ first_name }}</b> {{ last_name }}', autoescape=True).render(first_name='John', last_name="Doe")
>>> print(type(inner))
<class 'str'>

While reviewing your PR, I considered an alternative approach that could be possible instead of doing the escaping with Jinja.

I recently encountered several implicit conversions with Jinja, when an object is implicitly called __str__ and so something incorrect is rendered to Jinja without me or the integration tests noticing this magic stuff.

An alternative could be that the whole StrictDoc uses a wrapper of a str class, e.g. sdoc_str, that would disable the __str__ method with an explicit assert. Instead the safe version of a string would be implemented as Markup(self) and only this method would be called in each and every Jinja template.

With this approach, the diff could be even bigger but in return we would get a very good control over the strings that we really want to use everywhere. It is in the same direction like the code of field_escaped_value/field_unescaped_value that you are removing now but this escaping stuff would be inside the sdoc_str.

I have confidence that this approach could work just fine but I don't have enough data to compare the advantages/disadvantages of using Jinja's escaping magic vs packing that explicitly in sdoc_str.

The sdoc_str approach would fit the current philosophy of SDoc of doing as many things explicit as possible. The Jinja autoescaping stuff is a different approach but also a capable one.

Another argument for maintaining sdoc_str would be that the escaping business is really core to what StrictDoc does, so keeping this function at the core of all string types could make sense.

And finally there was another source for me thinking in the direction of custom string wrappers. StrictDoc has many cases where we want to deal with a non-empty string and we never want an empty string. Python doesnot support this out of the box, so having a class such as sdoc_ne_str or something like that would simplify code in many places. Maybe it is not too bad of an idea to combine the "non-emptiness" and "markup safety" in one or in a family of str-based wrappers.

EDITED:

These are only thoughts. Let's move forward with Jinja's autoescape, safe, Markup. The alternative approach could work but let's keep it aside.

haxtibal commented 2 months ago

EDITED:

These are only thoughts. Let's move forward with Jinja's autoescape, safe, Markup. The alternative approach could work but let's keep it aside.

Sorry for not commenting on your alternative approach yet. It sounds interesting. I was quite focused here and wanted to get autoescaping in a workable state first. Will comment soon and suggest to hop back to #1920 for discussion to not interleave the autoescaping approach with the sdoc_str approach too much.

stanislaw commented 2 months ago

Thanks a lot for contributing this!