strictdoc-project / strictdoc

Software for technical documentation and requirements management.
https://strictdoc.readthedocs.io/en/stable/
Other
152 stars 24 forks source link

Epic: Correct escaping in HTML output #1920

Open haxtibal opened 3 months ago

haxtibal commented 3 months ago

To reproduce, generate an html export of the following document

[DOCUMENT]
TITLE: Written <b>bold</b>

[SECTION]
TITLE: Some text is <invisible>

[REQUIREMENT]
UID: REQ-1
TITLE: <script>alert(1)</script>
STATEMENT: >>>
We can also execute scripts.
<<<

[/SECTION]

I assume this is a bug, not a feature...

html.escape is already used in some places. I'll have a look and try to catch more cases.

stanislaw commented 3 months ago

Thanks for reporting this. It is a lack of feature 😄

The html.escape was implemented for multiline fields but not for single-line. Let's implement it for single-line as well.

haxtibal commented 3 months ago

The html.escape was implemented for multiline fields

IIUC multiline was implemented for the UI forms in #1018. Multiline in static views (like components/requirement/index.jinja) is escaped too, but for a different reason: We pipe it through docutils, which does escaping under the hood.

I thought about when and where is the best place to do the escaping:

I'd probably start by trying automatic escaping and see how it works out. Do you have an opinion?

stanislaw commented 3 months ago

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component

Sorry for the delays with my responses, I am running low on my free time.

I think I would prefer the enumerate_all_fields_escaped option for the time being to have it central like you said. Adding the HTML-specific stuff to a core component should be ok because currently there is no intermediate layer between SDoc classes and Jinja. I started introducing the view object classes to be that kind of interface to Jinja but in this specific example with escaping it will probably not help. We could revisit this later if a stronger need in an intermediate layer would emerge.

If we used enumerate_all_fields_escaped everywhere, could the change be contained in only a small number of places?

haxtibal commented 3 months ago

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component

Sorry for the delays with my responses, I am running low on my free time.

No hurry. Same here.

I think I would prefer the enumerate_all_fields_escaped option

I did some trials with Jinja2 autoescaping already yesterday and wanted to present it first before discarding the idea. Therefore #1921. If you don't agree with the reasoning just close that MR and let's continue discussion here.

stanislaw commented 2 months ago

@haxtibal

I have added a few TODOs here but let's consider them a very low priority. You/I don't have to rush into completing them all as long as your use case is working.

I would just like to have more test coverage to make sure we didn't miss any screens just in case. StrictDoc has many such low-priority tickets which all would love attention, so let's move according to what stands out most.

stanislaw commented 1 day ago

It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains(&quot;TBD&quot;). We may be missing an end-to-end test to catch this.

haxtibal commented 1 day ago

It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains(&quot;TBD&quot;). We may be missing an end-to-end test to catch this.

Will have a look, hopefully tomorrow.

haxtibal commented 1 day ago

Hm, can't reproduce... When typing node.contains("TBD") into the search form, flask receives

INFO:     127.0.0.1:46104 - "GET /search?q=node.contains%28%22TBD%22%29 HTTP/1.1" 200 OK

This looks OK and works. How did you test it?

stanislaw commented 6 hours ago

Good to know that the search screen itself is not affected.

My testing was around the links that are on the statistics screen: http://127.0.0.1:5111/project_statistics.html. The legend has escaped/broken hrefs.