tweaselORG / meta

(Currently) only used for the issue tracker.
2 stars 0 forks source link

Complaint generator #41

Open baltpeter opened 7 months ago

baltpeter commented 7 months ago

How do we go from a HAR file and the output of TrackHAR run on that HAR to a complaint?

baltpeter commented 7 months ago

One of the first issues I'm running into is the question of what output formats we want to produce, and, relatedly, which library/software we want to use to produce those.

Obviously, plain text is not enough as an output format. The DPAs typically only support a very limited set of formats for attachments. I think it's pretty safe to say that we will need to at least be able to produce PDFs.

Do we want to allow users to edit the generated complaint? That obviously wouldn't be possible with PDFs. We could also output ODT (or DOCX, I guess…), which the user could easily edit and convert to PDF themselves. But I would guess that users wanting to edit the complaint is rather unusual, and thus making them convert the ODT manually would be quite (unnecessarily) annoying. So, we would need to produce both ODT and PDF. And obviously, the PDF should be the same as when the user manually saves the ODT to PDF.

But that means we would have to run LibreOffice. I'm pretty sure that there is even an official CLI for this purpose, but a) for various reasons I would much rather be able to run this client-side, and b) I'm concerned about the security implications of running LibreOffice on at least partially user-influenced files on our servers. That seems like a bad idea.

And another thing to consider: I would like to have a separate HAR "renderer" library that can generate a nice human-readable document from a HAR files. That should also be able to at least output PDF files. But I think it would also be nice to have an HTML output option so we could offer an online HAR viewer. For this use case, having ODT as the "input format", from which the other formats are generated really doesn't make sense.

baltpeter commented 7 months ago

Oh, and another potentially annoying constraint: We need quite advanced reference features that probably aren't supported by too many solutions: We definitely need hierarchically numbered section headings that we can reference elsewhere in the text. And I would also definitely like to have margin/paragraph numbers and be able to reference them elsewhere, as well.

image

baltpeter commented 7 months ago

Options I'm considering:

Typst

Carbone

Pandoc

PSPDFKit

Misc NPM libraries

baltpeter commented 7 months ago

Based on that, I don't really think that there's any solution that ticks all of our boxes. I think Typst looks like the best option. I guess we'll have to do without an ODT export (at least for now). And I guess having a more focused HAR to PDF library might even be nicer than a HAR online viewer that also happens to export to PDF, especially since it's not exactly hard to find other HAR viewers. And rendering a HAR in the browser and for PDF are actually quite different use cases, that should probably display everything differently.

baltpeter commented 7 months ago

I expected paragraph numbering to be really easy to implement in Typst (just set up a counter and a #show rule on par), but alas it isn't. That causes an infinite recursion and is a known bug (https://old.reddit.com/r/typst/comments/16ltmtx/documentwide_enumeration/k1gzzzr/?context=3, https://github.com/typst/typst/issues/229, https://github.com/typst/typst/issues/519).

That basically only leaves us with two options: Continue without paragraph numbering and hope that this is fixed before we actually need the exported PDFs, or wrap every paragraph in a custom function (it would also be possible to use a #set rule on enum instead like in https://github.com/typst/typst/discussions/2506 but the difference in syntax isn't much of an improvement considering we're generating the Typst code and also this would mean that we couldn't use actual numbered lists anywhere in the document). Quite unfortunate. :/

baltpeter commented 7 months ago

Progress is being made. I'm starting with the technical reports. Here's an example of what we can generate already: PDF

Code in https://github.com/tweaselORG/complaint-generator/pull/1.

baltpeter commented 7 months ago

We also need to deal with escaping. I had already discussed that a bit in https://github.com/tweaselORG/meta/issues/42#issuecomment-1838486416.

Here, things are a bit more complicated since I a) would really like to enable autoescaping and b) we have both values in plain text and code blocks that need to be escaped.

Nunjucks does have an autoescape feature but it is only designed for HTML templates and thus of no help for us (which is why I had initially disabled it). From looking at the code, the escaping is hardcoded and there is no way to replace it with a custom escaping function. But manually escaping each value is cumbersome and error-prone. So I decided to use patch-package.

I implemented that in https://github.com/tweaselORG/complaint-generator/pull/1/commits/3fad5cb192b1f4893e0b274d205474b879f37e38 with a slightly more robust version than what I had initially for our HAR renderer.

baltpeter commented 7 months ago

We now also have controller notices: PDF

(oops, looks like I forgot to post this yesterday? o.o)

baltpeter commented 3 months ago

After a lot of work, we now have: complaints!

PDF