sasozivanovic / memoize

A cross-format package for externalization of graphics and memoization of compilation results in general
LaTeX Project Public License v1.3c
18 stars 3 forks source link

any possibilites to use memoize for multiple page in the future ? #30

Open romain-noel opened 1 month ago

romain-noel commented 1 month ago

From section 3.1 of the documentation, I can read clearly that "an extern can’t be broken across lines or pages". However, I think that have such possibility would be a very powerful feature. For example, I could be used to externalize long multi-page tables. Or even better, when writing books, already stabilized chapters could be externalized to save compilation time (and still preserving internal references, counters, etc.).

So, simplify as a question, is there any eventuality to have an extern broken across lines or pages with future developments of memoize ?

sasozivanovic commented 1 month ago

While an extern cannot be broken across lines or pages by definition (an extern is just a picture, after all), Memoize actually supports producing multiple externs per a piece of memoized code. However, producing multiple externs per memo heavily depends on the usage case, in particular, on the implementation details of the memoized code.

You mention long tables. I have been thinking about those. Section 4.4.3 presents a general approach that I believe could work for multi-page tables. However, I don't think that it makes sense for Memoize to support all the various multi-page table packages (longtable, supertablular, etc.); maintaining such support would be a major undertaking. I rather believe that the functionality should be implemented in the multi-page table package itself: the package should be aware of Memoize and provide a memoization driver which can produce multiple externs per table. It is in order to make this possible that much of Memoize's internals are publicly accessible and extensively documented.

The idea about externalizing entire chapters is similar to the idea of externalizing Beamer frames. Neither of these can work with Memoize as it is, because Memoize puts externs on the page rather than including an extern as a page. However, this is an issue that I could (and I am willing to) address in a future release. For Beamer, some further support from the Beamer side would certainly be required to have this work, but support for chapters might belong to Memoize itself. I'll have to think about this more ... preserving all the various references, counters and such automatically is not trivial ... if possible at all.

cfr42 commented 4 weeks ago

I am not at all sure whether I should say this or not. It is certainly not my place to tell you what to do with your software and nobody asked for my opinion. So by all means tell me to disappear.

I think it's worth thinking about whether turning tables into images is something you want memoize to do. For various reasons, I have to say I think it's not at all a good idea. I don't think it is a good way to speed up compilation, even if it has that effect.

First, a minor concern which is not actually an objection. Using memoize may actually increase compilation time in complex, but common, scenarios. Suppose I do the following:

  1. I have a document with a very long longtable and a dozen forest trees.
  2. I externalise the table and the trees with memoize.
  3. Probably these get externalised gradually as I add and edit things.
  4. Now I'm finished I disable memoize.
  5. On my next compilation, the trees are compiled successfully, but my table isn't. I may need multiple further runs before my table is compiled successfully. Each of those further runs requires the dozen trees to be compiled anew.

For comparison:

  1. I have a document with a very long longtable and a dozen forest trees.
  2. I externalise only the trees with memoize.
  3. Probably these get externalised gradually as I add and edit things. My table gets compiled every time and every time information about column widths etc. is written to the .aux. Provided I compile a few times during the writing process, the final widths will stabilise in the .aux before I'm finished. (If not, I can compile a couple of times to check the table at the end.)
  4. Now I'm finished I disable memoize.
  5. On my next compilation, the trees are compiled successfully, as is my table.

It is not at all obvious that externalising longtables, for example, is helpful in the way that externalising, say, forests is. The only way around this that I can see would be to record the column widths etc. which longtable calculates over a series of runs and then feed them back pre-digested to longtable on the final run.

But nothing I've said so far constitutes a reason not to provide the option. Certainly there are tabular packages which provide far costlier environments in terms of compilation time. (tabularray, for example.) And surely there are documents which would benefit from externalising such tables durin development.

Second, however, my major concern isn't whether it would actually reduce compilation time etc., but whether the convenience of decreased compilation time is worth the cost of reducing it.

There are serious issues concerning accessibility here. You may tell people they should disable memoize for the final run, but how many will actually do that? I certainly don't. People with free accounts on Overleaf (assuming Overleaf will support memoize at some point) almost certainly won't.

Most people won't read the advice, won't follow it and won't remember to do it. I'm prepared to bet very few will do all three.

If externalised images end up in the final version, you get a larger PDF, but that's about it. If externalised text (including tables etc.) ends up in the final version, you get a larger PDF which contains large chunks of unnecessarily inaccessible content.

Of course, people can already use it for tabulars which fit on a single-page. But there's a difference between providing something which can only be used to do something problematic and providing something which can be used to do something problematic. Right now, I struggle to see a non-problematic use for multi-page externs which would be worth the price.

Third, I think there must be better options for multi-page tabulars and tabulars in general e.g. write the contents of boxes out as text and read them back in pre-digested. That is, externalise as text rather than images1. That's not what memoize does, but it shouldn't be terribly surprising if externalising everything as images isn't the best way to externalise everything.

1I don't know if this is actually possible, but it seems prima facie plausible given my limited knowledge of TeX. If i can put stuff into a box and then unbox it, without having to re-typeset it, surely I can write the box out to file and read it back without having to re-typeset it?

sasozivanovic commented 3 weeks ago

Thank you Clea, these are all very good points!

About Overleaf. I realize now that Overleaf does not have the necessary Perl and Python modules installed. I'll ask them to install them. However, I just tested to confirm that the TeX-based extraction method (\usepackage[extract=tex]{memoize}) works.

The accessibility issue you mention is a big issue in my opinion, as well, already with TikZ pictures and especially Forest trees. I actually have a vision to provide something like what you describe for tables by storing the low-level PGF commands into the memo. I hope I can get to that in some reasonable time. (As far as I know, saving the content of boxes as text files is impossible.)

cfr42 commented 3 weeks ago

As far as I know, saving the content of boxes as text files is impossible.

How odd: you can show the content in the log/on the terminal, but can't write it to a different file. I just assumed you could do the one if you could do the other.

sasozivanovic commented 3 weeks ago

How odd: you can show the content in the log/on the terminal, but can't write it to a different file. I just assumed you could do the one if you could do the other.

As I said, as far as I know ... don't take my word for it!

you can show the content in the log/on the terminal

Do you mean by \tracingoutput=1? I always thought that this is merely a diagnostic tool, not that it outputs a full representation of the shipped boxes.

cfr42 commented 3 weeks ago

you can show the content in the log/on the terminal

Do you mean by \tracingoutput=1? I always thought that this is merely a diagnostic tool, not that it outputs a full representation of the shipped boxes.

I was thinking of the way l3build regression tests work. The vast majority output and compare text. Sometimes that's unformatted text, sometimes the contents of pages, sometimes the contents of boxes. How much gets shown obviously depends on things like the value of \tracingoutput. I guess I was (very vaguely) wondering if there was a way to capture the information TeX would need to typeset the box in pre-digested form.

sasozivanovic commented 3 weeks ago

I honestly don't know, but I guess there is not, as those are just log files, afaik.

Perhaps the route to go is to have a better mechanism for embedding PDFs ... at the end of the day, a PDF is there precisely to store the content of boxes. I would say that the fact that an embedded PDF behaves essentially like a picture is a consequence of the embedding procedure. If a PDF page could be "unboxed", accessibility issues might dissolve. But then again, given the complexities of the PDF format I know very little about, I'm probably mightily oversimplifying here ...