This leverages the changes from #1920 to allow passing an option to the model -> HTML -> DOCX pipeline to say "please insert annotations and notes adjacent to their insertion points instead of at the end of the chapter." This is necessary for PagedJS to lay out footnotes from the correct position.
This also finishes earlier work to fork the PDF-centric layout away from reading mode:
The PDF endpoint now has its own URL instead of being a query param with reading mode.
The PDF output page doesn't need any JS code besides PagedJS itself and a small init script, so I moved pagedjs out of package.json entirely and just vendored it as a normal static asset. (Since the Playwright browser will never have cache or state there's no reason to worry about cache-busting or similar complexities.)
The annotations serialization code now accepts a new export_option to introduce annotations in the way that PagedJS expects.
The actual Django view and output templates are the same in reading mode and PDF view, though I think the templates should probably also be split eventually, as they don't share much code anymore.
Performance
For small < 500 page casebooks, PDFs render on my laptop in under 15 seconds. For very long casebooks the story... isn't great, but it's not infinitely long as when doing client-side annotation rendering.
For a very long (1,200 page) casebook, the DOCX pipeline takes 15 seconds on my laptop and the PDF one takes a bit over a minute. I don't love it! It may be worse in staging/prod—looks like the process is mostly CPU-bound.
Almost all of that time is in PagedJS itself segmenting the pages and producing nice artifacts like real footnotes:
CSS Paged Media does not support footnotes as a first-order thing, and I think with potentially multiple footnotes per page it'll be difficult to get something more lightweight that is also resilient to lots of different markup. I'll probably timebox some experiments though! If that doesn't work out I think we could do a lot with eagerly caching PDF exports since in the vast majority of cases they won't need to be on-demand.
This leverages the changes from #1920 to allow passing an option to the model -> HTML -> DOCX pipeline to say "please insert annotations and notes adjacent to their insertion points instead of at the end of the chapter." This is necessary for PagedJS to lay out footnotes from the correct position.
This also finishes earlier work to fork the PDF-centric layout away from reading mode:
package.json
entirely and just vendored it as a normal static asset. (Since the Playwright browser will never have cache or state there's no reason to worry about cache-busting or similar complexities.)export_option
to introduce annotations in the way that PagedJS expects.The actual Django view and output templates are the same in reading mode and PDF view, though I think the templates should probably also be split eventually, as they don't share much code anymore.
Performance
For small < 500 page casebooks, PDFs render on my laptop in under 15 seconds. For very long casebooks the story... isn't great, but it's not infinitely long as when doing client-side annotation rendering.
For a very long (1,200 page) casebook, the DOCX pipeline takes 15 seconds on my laptop and the PDF one takes a bit over a minute. I don't love it! It may be worse in staging/prod—looks like the process is mostly CPU-bound.
Almost all of that time is in PagedJS itself segmenting the pages and producing nice artifacts like real footnotes:
CSS Paged Media does not support footnotes as a first-order thing, and I think with potentially multiple footnotes per page it'll be difficult to get something more lightweight that is also resilient to lots of different markup. I'll probably timebox some experiments though! If that doesn't work out I think we could do a lot with eagerly caching PDF exports since in the vast majority of cases they won't need to be on-demand.
Example