Closed mattpr closed 5 months ago
16-bullseye-slim
does have perl
preinstalled, but if your CI image is something else (like the alpine variant), know that your image needs perl
installed. mkdirp
coming from?await mkdirp
or use a sync mkdirp variant.exiftool.end()
. I suspect this is the main issue causing your zombies.Thanks for the detailed reply @mceachen
I knew I should have taken more time to strip down the example, sorry.
Here is a stripped down repro.
Doesn't seem that node16 is the issue as 18 fails with same error...
Node 16 isn't supported.
Old project that is not actively developed. But that doesn't seem to be the issue here as 18 has the same issue.
16-bullseye-slim does have perl preinstalled, but if your CI image is something else (like the alpine variant), know that your image needs perl installed.
If it turns out this is a missing perl or whatever issue, that information isn't getting surfaced. Maybe that isn't it.
It'd be helpful to include a working example. Where is mkdirp coming from? You might want to either await mkdirp or use a sync mkdirp variant.
Sure. Repro above stripping out copy and mkdir stuff to show that isn't related to the problem (it was synchronous mkdirp under the hood FWIW)
BatchCluster does register for all childProcess exit events. See the node docs. Listening for process.on("UncaughtException") would be way overstepping, given there's no way for me to only capture exceptions batch-cluster is "responsible" for
Fair enough. In a parallelized build there are a lot of places the node process can crash. I just wanted to make sure we weren't ending up with zombie processes due to build crashes where .end()
was not getting called. But maybe it is related to your next point and batchcluster does a sane job of terminating child processes during a node crash.
To be clear, I wasn't implying you would handle other codes' exceptions or prevent node from crashing...just that you would detect node is crashing and do some cleanup to avoid zombie processes if batchcluster isn't doing that itself already.
You aren't awaiting exiftool.end(). I suspect this is the main issue causing your zombies.
Thanks. Could be. I'll open a separate issue if I continue to notice zombie processes after build crashes.
I pulled your repo and tried to reproduce locally:
node:18-bullseye
works finenode:18-bullsye-slim
fails with the error you described.If you change your exiftool
instance to include a logger:
exiftool = new ExifTool({ taskTimeoutMillis: 5000, logger: () => require("batch-cluster").ConsoleLogger }),
You'll see the following output:
$ gulp build
[04:20:13] Using gulpfile /code/gulpfile.js
[04:20:13] Starting 'build'...
BatchProcess(104).onStderr(): Can't locate mro.pm in @INC (you may need to install the mro module) (@INC contains: /code/node_modules/exiftool-vendored.pl/bin/lib /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 /usr/lib/x86_64-linux-gnu/perl5/5.32 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.32 /usr/share/perl/5.32 /usr/local/lib/site_perl) at /usr/lib/x86_64-linux-gnu/perl-base/overload.pm line 123.
Turns out that Node's -slim
image doesn't include some perl modules that ExifTool relies on, including mro
.
It's even easier to validate by hand:
$ docker run -v.:/code -it node:18-bullseye-slim bash
root@5c5130e1a33d:/# cd /code
root@5c5130e1a33d:/code# node_modules/exiftool-vendored.pl/bin/exiftool -Creator= /code/src/files/dummy.pdf
Can't locate mro.pm in @INC (you may need to install the mro module) (@INC contains: node_modules/exiftool-vendored.pl/bin/lib /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 /usr/lib/x86_64-linux-gnu/perl5/5.32 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.32 /usr/share/perl/5.32 /usr/local/lib/site_perl) at /usr/lib/x86_64-linux-gnu/perl-base/overload.pm line 123.
I'll update the docs and error handling to detect this and throw this as an error in the future.
Thanks for taking another look at this. No problem to add missing deps to the slim image during the build. Very helpful to have the underlying issue surfaced of course. I also appreciate the debugging tip about adding the logger.
Thanks for the fix!
Getting an error on gitlab CI using docker image
node:16-bullseye-slim
which I can't reproduce locally (on macos).exiftool-vendored
version24.3.0
.some.pdf
in the error below is generally the FIRST pdf to be processed in the build job...so I suspect the error message is just a red herring and there is rather some environmental problem (missing dep, perl, etc that isn't installed with withnpm ci
).So I guess this is more question than bug report... any recommendations on how to troubleshoot underlying issues with the
batch-cluster
so I have a more useful error message?Build job
gitlab CI error
aside: I also noticed locally (macos) that after a build crash there is a zombie perl process running exiftool that is never cleaned up. I've added my own SIG/process cleanup try and prevent this from happening in the future, but would be nice if maybe exiftool-vendored handled this internally using a
process.on('uncaughtException',...
orprocess.on('exit',...
to try and kill off any batch-cluster stuff.