olsak / OpTeX

OpTeX - LuaTeX format with extended Plain TeX macros
http://petr.olsak.net/optex/
35 stars 13 forks source link

Better detection of "another run of TeX is needed" #64

Closed vlasakm closed 3 years ago

vlasakm commented 3 years ago

Currently there is a basic check of \_unresolvedrefs counter in \_byehook that emits a warning at the end of the run (if there really are any unresolved references). Warnings are also emitted by the macros which try to use data from the .ref file (e.g. \maketoc -- data unavailable, TeX me again).

A very minor issue is, that incrementing a counter in expandable macros is possible only with \immediateassignment.

The bigger issue is that, in "unresolved references" doesn't catch "not up to date" references! The data written to the .ref file may have completely changed from the previous run (e.g. in the first run pages of section titles may receive correct positions for a table of contents spanning a single page, but it may turn out to be longer next run when all the section headings are known).

I don't know how thoroughly LaTeX itself handles this (and what is left to external scripts), but I was pleasently surprised that PGF/TikZ handles this very nicely and ensures that a LaTeX warning is emitted if its written auxiliary information is different from the previous run.

Maybe this is best left to external scripts, but I think that then the \_unresolvedrefs warning is confusing - the user isn't fine when the warning disappears.

I propose either removing the warning or doing a more thorough detection of changed .ref file (e.g. using MD5 hash). I personally prefer the second option.

olsak commented 3 years ago

This is balancing between simplicity and universality. I prefer simplicity in this issue. I can add OpTeX trick similar to OPmac trick 0045 http://petr.olsak.net/opmac-tricks.html#refconsistent But more trick must be done here because .ref file is read and removed in \everyjob, so the last line of .ref file must be something like \ea\def \ea\REFcontents \ea{\input\jobname.ref}. But it is not exactly true, we must overcome error "File ended while scanning def".

vlasakm commented 3 years ago

I was thinking more about something like this:

https://github.com/olsak/OpTeX/compare/master...vlasakm:md5ref

I.e. hash the ref file before and after the run. Yes, in LuaTeX there is no \pdfmdfivesum, so a bit of Lua is needed, but the "function" is not that long.

I can turn this proof of concept into a pull request if you want, but these are the points that have to be sorted out:

By the way, the problem is real, and can be very subtle. For example for my thesis 3 runs are needed, but seemingly 2 do just fine. The difference between between the real labels and ?? can accumulates and results in a different page break (and a wrong reference from table of tables).

olsak commented 3 years ago

Thank you for your idea. I tried to implement it in my last commits. I kept \_unresolvedrefs and I removed kpse.find_file (because lua code hangs if the file does not exists: kpse.find_file returns nil but io.open does not like nil as input).

vlasakm commented 3 years ago

@olsak Is there any situation where \_unresolvedrefs catches incosistent situation which the checksum doesn't?

For backwards compatibility with foreign macros keeping the counter is fine, but I still wouldn't increment it in OpTeX itself.

I don't like emitting two different warnings. Again for backwards compatibility the old message can be kept in the new way.

Also isn't OpTeX trick 0032 obsolete now? It would at least be nice if it didn't hardcode the contents of \byehook...

I hoped that with the 100 % correct check (which costs some code), other things could be simplified in return.

olsak commented 3 years ago

I like two different warnings. If the warning "rerun to get references right" occurs repeatedly then it means that some reference hasn't its destination. If the warning "ref file changed" occurs repeatedly then it means that there is very rare problem where re-typesetting with new page info (or another info) yields to the old page info and re-typesetting with old page info yields to the new page info. I cannot give simple example now. I only remember that I solved such specific problem many years ago when I typeset very complicated legal texts (similar as in collection of laws) with many footnotes and more columns. The solution required manual intervention. You are right, the OpTeX trick 0032 is obsolete. I'll remove it.

vlasakm commented 3 years ago

As far as I am concerned the second case is covered by the recent changes (as you say, the new warning will occur repeatedly).

But in the first case there are already warning for that ("WARNING l.XXX: label [XXX] unknown. Try to TeX me again."). It is more specific and displayed anyways.

olsak commented 3 years ago

OK. Warning generated when positive \_unresolvedrefs was removed.