leo-colisson / robust-externalize

A LaTeX library to cache pictures (including tikz, python code, and more) in a robust, customizable, and pure way.
7 stars 2 forks source link

v2.1: xargs command assumes GNU xargs with flag -a which BSD xargs (macOS) does not have #8

Closed kiryph closed 8 months ago

kiryph commented 8 months ago

Running version v2.1 with

\robExtConfigure{compile in parallel} % requires GNU xargs in version 2.1

on macOS Ventura, I get following error:

Package robExt Warning: Warning: Compiling all missing figures in parallel
(robExt)                with "xargs -t -a
(robExt)                'robust-ext-gap-robExt-compile-missing-figures.sh' -I
(robExt)                "{}" -P 16 sh -c '{}'". You need to rerun LaTeX to
(robExt)                include them.

xargs -t -a 'robust-ext-gap-robExt-compile-missing-figures.sh' -I "{}" -P 16 sh
 -c '{}'xargs: illegal option -- a
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]]
             [-J replstr] [-L number] [-n number [-x]] [-P maxprocs]
             [-s size] [utility [argument ...]]
system returned with code 256
 robustExternalize/robExt-61C7AE1033A75BA026C5A1D0868BF8B2

One can remove the "illegal option -- a" as following:

<'robust-ext-gap-robExt-compile-missing-figures.sh' xargs -I '{}' -t -P 0 sh -c '{}'

The modified robust-externalize.sty would look like:

❯ git diff
diff --git a/robust-externalize.sty b/robust-externalize.sty
index ab9a05b..47f5e2d 100644
--- a/robust-externalize.sty
+++ b/robust-externalize.sty
@@ -2223,10 +2223,10 @@ if __name__ == '__main__':
   },
   compile in parallel with gnu parallel/.default={200\%},
   compile in parallel with xargs/.style={
-    compile in parallel command={xargs -t -a '\jobname-\robExtAddPrefixName{compile-missing-figures.sh}' -I "{}" -P #1 \robExtParallelShell\space '{}'},
+    compile in parallel command={<'\jobname-\robExtAddPrefixName{compile-missing-figures.sh}' xargs -t -I '{}' -P #1 \robExtParallelShell\space '{}'},
   },

IMHO, the default number of processes should not be set (i.e. not 16 but the value 0 can be chosen). The current value is arbitrary. For my current machine, the limit is higher than the available cores. If someone needs to limit it, the person should chose a suitable value for his/her machine/environment.

tobiasBora commented 8 months ago

Thanks for the report, I was not expecting Apple to be less compatible than windows (+Gow) on this point, and I was a bit afraid to use < as it is shell-dependent, but it seems like Windows also supports this syntax so it should be good to use it. So I fixed it in master (cf 3993798cc86481a6deb0f735aa56d8af7322c223).

But I’m not so sure that it is a good idea to use -P 0, as it will run start at the same time all the commands. So I made a test with 200 pictures, and all of them were compiled at the same time:

$ ps aux | rg pdflatex | wc -l
197

(while the same command with the current setting would give me something around 16) While one might think that it is faster this way, since latex is a CPU-intensive operation, this has many drawbacks:

  1. first, the whole system gets really laggy as it gets 200 CPU-intensive tasks to run at once, leaving little time for other processes to run,
  2. even worse, the final compilation time is significantly higher, in my benchmark it compiled it in 2mn02 instead of 1mn15 with the default -P 16. The reason (I think), is that a lot of time is lost in order to switch between the 200 processes.

The 16 I chose might seem arbitrary, but I chose it since nowadays it is quite frequent to have between 4 to 16 CPU threads (mine have 8 for instance), and choosing to run more threads than the actual number of CPU threads is not a problem (it might actually be faster to a certain degree, or slower if you put way to many… but here it should not be that bad in either cases). I could have tried to compute the number of threads at runtime, but it is quite hard to do in an OS-independent way without installing new stuff, so I decided to keep 16 by default. And anyway, it's super simple to change with compile in parallel with xargs=N, and GNU parallel already adapts to the number of CPU if needed.

I hope this makes sense. I will close this issue for now, but please could you check the latest version to check if it solves your problem and reopen if not?

kiryph commented 8 months ago

Thanks for your detailed answer

I was not expecting Apple to be less compatible than windows (+Gow) on this point

I already encountered several times when the BSD programs, which macOS contains, do not have the same features as equivalent GNU programs.

A general macOS issue is that (newer) GNU programs can have incompatible licenses . An example: macOS comes with following shells preinstalled:

❯ ls /bin/*sh
/bin/bash  /bin/csh  /bin/dash  /bin/ksh  /bin/sh  /bin/tcsh  /bin/zsh

The preinstalled bash is actually a GNU bash but it is an outdated version 3.2 from 2006. But Apple will not have newer versions of GNU bash due to a license change in GNU Bash 4.0. https://apple.stackexchange.com/questions/193411/update-bash-to-version-4-0-on-osx

In contrast the zsh shell is the most recent one (version 5.9) from 2022.

I was a bit afraid to use < as it is shell-dependent

Also GNU Bash 3.2 understands it (/bin/sh is /bin/bash under macOS)

sh-3.2$ <TODO.md xargs -I '{}' echo '{}'
# prints the content of the file TODO.md (possibly in random order)

But also dash, csh, ksh, tcsh. So I would assume the syntax is actually widely accepted by shells. (Just out of curiousity, if you happen to know one where it is not a valid, let me know.)


Regarding the number of processes

first, the whole system gets really laggy as it gets 200 CPU-intensive tasks to run at once, leaving little time for other processes to run,

True, but on my system even 16 processes will take all cores and I will hear the ventilator kicking in. One can use nice to set lower priority of the compilation so that other user processes get higher priority and the system does not get laggy.

I hope this makes sense.

Yes, it makes sense to me. I agree that 16 might be a good value for current personal computers.

And anyway, it's super simple to change with compile in parallel with xargs=N, and GNU parallel already adapts to the number of CPU if needed.

Yes, and I understand your motivation and I do not see a real problem with the default of 16.

If it bothers me that my 6 cores are taken for compilation of a document, I can set it to a lower value myself. People on 48+ core systems might pick a higher value. So it will always be personal choice.

tobiasBora commented 8 months ago

For a shell that is not compatible with the > syntax, you have nushell for instance, that would use pipe instead like open foo.txt | yourprogram http://www.nushell.sh/book/loading_data.html (but if I use pipe, then I need a different command for windows since cat is not available by default…).

Oh I see, you want less than the number of cores… I do expect it to take all cores by default, at least it is what I would prefer to do by default. What I’d like to avoid is to crash the system. Is your whole OS laggy when it compiles? Anyway, since I guess it's very much user dependent, I guess it's better to let the user change the setting if they don't like it. I could set 8 instead of 16, but then people running > 8 threads might compile slower…

kiryph commented 8 months ago

For a shell that is not compatible with the > syntax, you have nushell for instance

Thanks for the pointer. However, I think nushell is and probably will not become for the foreseeable future a default shell in an OS. Adding support for programs installed by user and one would enter a never ending story.

Oh I see, you want less than the number of cores… I do expect it to take all cores by default,

No, I actually would prefer to take all cores by default. But I could imagine that this could be a reason someone wants to change it to ensure that other tasks can get a full core (or several) not shared with a compilation process (no process switching in the cpu, ...).

What I’d like to avoid is to crash the system. Is your whole OS laggy when it compiles?

No, I did not encounter a laggy OS when compiling. Maybe I have to create a document with 200+ environments to see if this could make the system laggy.

However, right now it works very well (with the value of 16), so I do not see a reason for myself wanting it to change it.

tobiasBora commented 8 months ago

Ok great. Yeah, nushell is unlikely to become a default shell instead in any OS… But do you know if pdflatex always picks sh/cmd, or if it picks the default shell of the user?

Ok, perfect then!