Closed gajop closed 3 years ago
What is the output of running npx envinfo --binaries --system
?
Please can you provide a minimal repo (code, image and config) that allows someone else to reproduce this problem, including dependency versions in a package.json
.
I think I've been able to reproduce this on Linux using the prebuilt binaries provided by the latest versions of sharp and electron.
It looks like there's problem where strings can become null and it's occurring somewhere between the N-API layer and electron.
I think this is due to nodejs and sharp depending on libstdc++ (which is the right thing to do) but electron does not.
$ ldd $(which node) | grep "c++"
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fefa4f3f000)
$ ldd node_modules/sharp/build/Release/sharp.node | grep "c++"
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007efecbdfa000)
$ ldd node_modules/electron/dist/electron | grep "c++"
... crickets ...
From what I can tell, the prebuilt binaries provided by electron statically-link the ABI-incompatible libc++ instead but expose API that relies on it.
There appear to be quite a few open issues against electron relating to libc++ vs libstdc++ conflicts: https://github.com/electron/electron/search?q=%22libc%2B%2B%22&type=issues
I'm unsure what we can do in sharp; supporting electron and its ABI changes has been an ongoing headache for years.
What is the output of running npx envinfo --binaries --system?
Unsure if you still want the answer, but:
npx envinfo --binaries --system?
npx envinfo --binaries --system
npx: installed 1 in 1.194s
System:
OS: Linux 5.4 Ubuntu 20.04.1 LTS (Focal Fossa)
CPU: (4) x64 Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
Memory: 3.33 GB / 15.58 GB
Container: Yes
Shell: 5.8 - /bin/zsh
Binaries:
Node: 12.8.1 - ~/.nvm/versions/node/v12.8.1/bin/node
Yarn: 1.22.10 - ~/.nvm/versions/node/v12.8.1/bin/yarn
npm: 6.13.7 - ~/.nvm/versions/node/v12.8.1/bin/npm
And here's a way to reproduce it: https://github.com/Jazcash/spring-map-parser/files/5815154/test_gajop.zip
You can see the full discussion here if interested: https://github.com/Jazcash/spring-map-parser/issues/1
Electron docs suggest that developers should rebuild native modules with Electron: https://www.electronjs.org/docs/tutorial/using-native-node-modules . What is the native module in question in this situation?
I also don't get a clear error like the one mentioned in their docs (i.e. Error: The module '/path/to/native/module.node' was compiled against a different Node.js version using ...
) - do you have any idea why?
Thank you for the extra info, I'm fairly certain this is a problem with the prebuilt Linux binaries provided by electron.
The native module docs provided by electron are out-of-date with respect to its use of the API-stable and ABI-stable N-API bridge between JS and C++ used by sharp and many other modules, which do not require rebuilding.
I'll attempt to prepare a minimal example that uses only N-API and electron then submit it to the electron repo. This minimal example will hopefully not use sharp as I think the symptoms reported in this issue are effects rather than cause.
A possible workaround is to compile your own electron from source, ensuring it correctly links against the system libstdc++. I don't (yet) know enough about electron's build process to be able to help with this.
I've done a bit more digging and this is an ABI compatibility problem relating to strings, not with libstdc++/libc++ as I first guessed, but with glib's G_TYPE_STRING
.
The prebuilt binaries for Linux provided by electron currently depend on the shared system glib (as well as many, many other shared system libraries):
$ ldd node_modules/electron/dist/electron | grep glib
libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007feb8a40a000)
$ ldd node_modules/electron/dist/electron | wc -l
98
The prebuilt binaries provided by sharp contain their own statically-linked glib to help prevent such conflicts but these are currently split into 2 files, libvips-cpp.so and libvips.so. It looks like libvips-cpp.so can erroneously use the shared system glib already loaded by electron when it reaches this point:
This means a situation can arise where a G_TYPE_STRING
is created with an older system glib then accessed later with a newer statically-linked glib.
I'm going to investigate if we can merge libvips-cpp.so and libvips.so into a single file.
As I've said before, it's equally incumbent on electron to help here too. If they were able to create prebuilt binaries that statically linked more of its dependencies then this whole class of error would go away.
A possible workaround for now is to set the LD_PRELOAD
environment variable when running electron:
$ LD_PRELOAD=node_modules/sharp/vendor/8.10.5/lib/libvips.so.42 electron script.js
@lovell thanks for this! This explains why building libvips using shared libraries and using old system versions for every library linked to by Electron seems to work (that's my current workaround).
Having had a quick scan through the glib changelog, I think https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1497 in glib v2.65.0 might be a possible ABI breakage causing this.
@inukshuk I'm now reviewing whether your suggestion in https://github.com/lovell/sharp-libvips/pull/76 might be worthwhile after all and perhaps we could go further and rewind to glib v2.64.0.
Rolling glib all the way back to v2.62.3 in the statically-linked libvips.so
binary, to match the version used by electron's build environment, doesn't fix this so it's unlikely to be caused by a glib ABI change.
When debugging, all the values are correct and as expected within libvips-cpp.so
up until this point:
https://github.com/libvips/libvips/blob/1871567516c8321349acbe66abf4f60228730b13/cplusplus/VImage.cpp#L536
...when it hands off to the vips_cache_operation_buildp()
call implemented in libvips.so
. TheVipsOperation
passed in fails the vips_object_build()
call.
The next step is probably to attempt to build a single libvips-cpp.so
that statically links libvips.so
and everything else that file is already statically linking.
I also tried to debug this and ended up with this GDB backtrace:
$ gdb node
(gdb) set detach-on-fork off
(gdb) set non-stop on
(gdb) set pagination off
(gdb) handle SIGSYS noprint pass
(gdb) catch throw
(gdb) run /usr/bin/electron .
(gdb) info threads
(gdb) thread 30.14
(gdb) bt
#0 0x00007fffdf56d492 in __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) (obj=0x10bc5e923fb0, tinfo=0x7fffdf69d1c8 <typeinfo for std::logic_error>, dest=0x7fffdf582f30 <std::logic_error::~logic_error()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:78
#1 0x00007fffdf564298 in std::__throw_logic_error(char const*) (__s=0x7fffe12b54a0 "basic_string::_S_construct null not valid") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:66
#2 0x00007fffe128e2eb in sharp::DetermineImageType(char const*) () at /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node
#3 0x00007fffe128eb89 in sharp::OpenInput(sharp::InputDescriptor*) () at /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node
#4 0x00007fffe12a8435 in PipelineWorker::Execute() () at /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node
#5 0x00007fffe12935f5 in Napi::AsyncWorker::OnAsyncWorkExecute(napi_env__*, void*) () at /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node
#6 0x0000555556e44694 in ()
#7 0x00007fffdf4c2640 in ()
#8 0x0000000000000000 in ()
Without debugging symbols, it's hard to say where it might go wrong, but I think it's this function: https://github.com/lovell/sharp/blob/4c57ac2bbe8cbc8afab70a67d92f4cbf4748f153/src/common.cc#L243
Curiously, this is not one of the linked code above. I'll try to build the prebuilt binaries with debugging symbols.
It looks like Electron depends on GConf
which has a dependency on libdbus-glib-1
that seems to conflict with the prebuilt binaries provided by sharp.
$ strace -f -e openat electron . 2>&1 | grep gconf
[pid 541054] openat(AT_FDCWD, "/usr/lib64/gio/modules/libgsettingsgconfbackend.so", O_RDONLY|O_CLOEXEC) = 57
[pid 541054] openat(AT_FDCWD, "/usr/lib/node_modules/electron/dist/libgconf-2.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 541054] openat(AT_FDCWD, "/lib64/libgconf-2.so.4", O_RDONLY|O_CLOEXEC) = 57
$ ldd /usr/lib64/gio/modules/libgsettingsgconfbackend.so | grep dbus
libdbus-glib-1.so.2 => /lib64/libdbus-glib-1.so.2 (0x00007fe909dca000)
libdbus-1.so.3 => /lib64/libdbus-1.so.3 (0x00007fe909d79000)
$ strace -f -e openat electron . 2>&1 | grep libdbus-glib
[pid 540708] openat(AT_FDCWD, "/usr/lib/node_modules/electron/dist/libdbus-glib-1.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 540708] openat(AT_FDCWD, "/lib64/libdbus-glib-1.so.2", O_RDONLY|O_CLOEXEC) = 57
$ LD_PRELOAD=/lib64/libdbus-glib-1.so.2 node -e "const sharp = require('sharp'); sharp('test.png').toFile('test.jpg')"
node:internal/process/promises:227
triggerUncaughtException(err, true /* fromPromise */);
^
[Error: basic_string::_S_construct null not valid]
Great sleuthing Kleis, what version of electron is this? I don't see node_modules/electron/dist/libdbus-glib-1.so.2
locally with the latest v11.2.0.
Your comment led me to discover https://github.com/electron/electron/pull/19498 that suggests gconf might have been removed, although there are many linked issues with many comments.
(A common problem I hear about electron is memory consumption. I wonder if switching to static linking of its dependencies plus some well-placed LTO might help with this. It will certainly help us. Perhaps we can get Skype or Slack to sponsor this? :wink: )
Well spotted!
There is no libdbus-glib-1.so.2
above either (ENOENT) -- I assume that's just where Electron looks for it first? I guess gconf
is pulled in by one of the delegate libraries, because Electron does not seem to link to it directly. I think it might be due to libgio
dynamically loading modules?
It looks like libgobject-2.0.so
might be the (modern) cause:
$ readelf -d node_modules/electron/dist/electron | grep gobject
0x0000000000000001 (NEEDED) Shared library: [libgobject-2.0.so.0]
When emulating the loading of this shared library before Node.js, as electron will do, the same error occurs:
$ LD_PRELOAD=/lib/x86_64-linux-gnu/libgobject-2.0.so node -e "require('sharp')('in.png').toFile('out.png')"
(node:24095) UnhandledPromiseRejectionWarning: Error: basic_string::_S_construct null not valid
This is likely to prevent Electron on Linux from working with any native Node.js module that also uses gobject.
Indeed, I've noticed that too. Here are the conflicting symbols:
$ LD_DEBUG=bindings LD_PRELOAD=/lib64/libgobject-2.0.so.0 node -e "require('sharp')('in.png').toFile('out.png')" 2>&1 | grep "to /lib64/libgobject-2.0.so.0"
1368935: binding file /lib64/libgobject-2.0.so.0 [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_param_spec_types'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/../../vendor/8.10.5/lib/libvips.so.42 [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_param_spec_types'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/../../vendor/8.10.5/lib/libvips-cpp.so.42 [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_param_spec_types'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/../../vendor/8.10.5/lib/libvips-cpp.so.42 [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_object_ref'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/../../vendor/8.10.5/lib/libvips-cpp.so.42 [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_object_unref'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_type_check_instance_is_a'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_object_ref'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_object_unref'
1368935: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_type_from_name'
https://stackoverflow.com/a/6540059/10952119 is relevant here, but I haven't had any luck resolving this yet.
This is likely to prevent Electron on Linux from working with any native Node.js module that also uses gobject.
That is, native Node.js modules which include a static copy of gobject, right? Because, linking to gobject as a shared library, i.e., using the same library as pre-loaded by Electron seems to work just fine. Or do you think this might cause other related issues?
native Node.js modules which include a static copy of gobject, right?
@inukshuk Yes, I'm unsure how many that might affect other than sharp. I can't really see a way around this other than building a single libvips-cpp.so
without the corresponding libvips.so
on Linux.
I had an attempt to use a version script, see the sharp-issue-2535
branch and commit https://github.com/kleisauke/libvips-packaging/commit/33470b7f2bc246ba11cecefaef61bba4b06fb2ce. With that I see:
$ LD_DEBUG=bindings LD_PRELOAD=/lib64/libgobject-2.0.so.0 node -e "require('sharp')('in.png').toFile('out.png')" 2>&1 | grep "to /lib64/libgobject-2.0.so.0"
330182: binding file /lib64/libgobject-2.0.so.0 [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_param_spec_types'
330182: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_type_check_instance_is_a'
330182: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_object_ref'
330182: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_object_unref'
330182: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libgobject-2.0.so.0 [0]: normal symbol `g_type_from_name'
$ LD_DEBUG=bindings LD_PRELOAD=/lib64/libgobject-2.0.so.0 node -e "require('sharp')('in.png').toFile('out.png')" 2>&1 | grep "sharp.node .* to /lib64/libglib-2.0.so.0"
330213: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libglib-2.0.so.0 [0]: normal symbol `g_once_impl'
330213: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libglib-2.0.so.0 [0]: normal symbol `g_setenv'
330213: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libglib-2.0.so.0 [0]: normal symbol `g_log_set_handler'
330213: binding file /home/kleisauke/test/node_modules/sharp/build/Release/sharp.node [0] to /lib64/libglib-2.0.so.0 [0]: normal symbol `g_snprintf'
So, libvips.so.42
and libvips-cpp.so.42
no longer exports the GLib symbols and thus avoids possible conflicts. Unfortunately, this does not resolve this issue as sharp.node
still depends on GLib symbols:
$ nm -D node_modules/sharp/build/Release/sharp.node | grep "U g_"
U g_assertion_message_expr
U g_free
U g_log_set_handler
U g_malloc
U g_object_ref
U g_object_unref
U g_once_impl
U g_setenv
U g_snprintf
U g_type_check_instance_is_a
U g_type_from_name
I also tried to export the above symbols and have a version assigned to it (e.g. @@SHARP_0.27.0
) but somehow the runtime linker still prefers the unversioned symbols. I can think of two ways to resolve this:
.a
files instead of .so
.g_type_from_name
). We can move some helpers to libvips' C++ binding if needed.My preference is for option 2, what do you think?
Oh, it seems option 2 does not work as desired. On the issue-2535
branch (commit https://github.com/kleisauke/sharp/commit/fd03dd25fd44e0b6fb9ddd96e2efc570bfa89493), I see:
$ LD_PRELOAD=/lib64/libgobject-2.0.so.0 node -e "require('sharp')('in.png').toFile('out.png')"
(process:340011): GLib-CRITICAL **: 18:49:02.315: g_datalist_id_set_data_full: assertion 'key_id > 0' failed
(process:340011): GLib-GObject-CRITICAL **: 18:49:02.315: g_param_spec_pool_lookup: assertion 'pool != NULL' failed
(process:340011): GLib-GObject-WARNING **: 18:49:02.315: g_object_set_is_valid_property: object class '(null)' has no property named 'access'
(process:340011): GLib-CRITICAL **: 18:49:02.315: g_datalist_id_set_data_full: assertion 'key_id > 0' failed
(process:340011): GLib-GObject-CRITICAL **: 18:49:02.315: g_param_spec_pool_lookup: assertion 'pool != NULL' failed
(process:340011): GLib-GObject-WARNING **: 18:49:02.315: g_object_set_is_valid_property: object class '(null)' has no property named 'fail'
(process:340011): GLib-CRITICAL **: 18:49:02.315: g_datalist_id_set_data_full: assertion 'key_id > 0' failed
(process:340011): GLib-GObject-CRITICAL **: 18:49:02.315: g_param_spec_pool_lookup: assertion 'pool != NULL' failed
(process:340011): GLib-GObject-WARNING **: 18:49:02.315: g_object_set_is_valid_property: object class '(null)' has no property named 'filename'
Segmentation fault (core dumped)
I'm not sure whether 2 GLib instances can run in the same process.
Another option would be to build GLib shared, this essentially restores the behavior of v0.25.0 and ensures that it will pick the system library in case that is loaded first. I'm not sure what happens if a dependency of libvips requires newer GLib symbols that aren't available in the (preloaded-)system libraries (e.g. Pango v1.48.1 requires GLib v2.62, which is sometimes not available in the major distros).
See commit https://github.com/kleisauke/sharp/commit/842c71f87315d997c5561f and https://github.com/kleisauke/libvips-packaging/commit/aaf8158ef4833eafba0dd6fe3e330bf8c27cecff for a way of doing this. I tested it against the above use cases without any problems.
Thanks for the all the great research Kleis. I think reverting to a shared glib could re-open too many (Electron and non-Electron) wounds/issues that moving to static-linking has fixed.
I'd be happy to remove the use of glib from sharp's own source if that helps (although the original problem was coming from libvips-cpp.so
).
A complete static linking of all the .a
files into a single sharp.node
would "leak" LGPL terms onto it, but we distribute the source so I think that's OK, although it could prevent someone else from selling a modified sharp. We'd also need to provide prebuilt sharp binaries for ARMv6 and ARMv7 (at the moment only libvips is provided for these).
I still think we might be able to create a single libvips-cpp.so
shared library that is statically-linked internally.
I can confirm that statically linking libvips.so.42
into libvips-cpp.so.42
(see commit https://github.com/kleisauke/sharp/commit/9eebe5157bf84a553bb16d00cb99f835001c4f40 and https://github.com/kleisauke/libvips-packaging/commit/3ebbf7f3427e7bf8fc4bfc50fb87ea180a3590b5) also resolves the above mentioned problems. Commit https://github.com/kleisauke/sharp/commit/fd03dd25fd44e0b6fb9ddd96e2efc570bfa89493 is still needed to resolve the basic_string::_S_construct null not valid
error.
Note that it requires to unexport the g_param_spec_types
symbol (see commit https://github.com/kleisauke/libvips-packaging/commit/3f4289b7298ee60502dfb67366ca011f40684227) to make these warnings disappear:
(this can also be done by exporting only the symbols that sharp requires, but that is less maintainable)
@kleisauke Amazing work, let's do this. We can apply https://github.com/kleisauke/sharp/commit/fd03dd25fd44e0b6fb9ddd96e2efc570bfa89493 to sharp right now as I plan a patch release within the next few days. Then we can make the packaging changes; PR welcome, as always. Could there be any impact on macOS with (or for that matter without) this switch?
Great! I'll prepare a PR for the packaging changes. I don't think this will affect macOS, will do a test build just to be sure.
The unit tests run without any problems on macOS. I just opened https://github.com/lovell/sharp-libvips/pull/81 and https://github.com/lovell/sharp/pull/2551 to address this issue.
Testing can be done with:
rm ~/.npm/_libvips/libvips-8.10.5-linux-x64.tar.br
SHARP_DIST_BASE_URL="https://github.com/kleisauke/libvips-packaging/releases/download/v8.10.5-single-shared/" \
npm install kleisauke/sharp#single-shared
v0.28.0 now available. Thank you @kleisauke for all your help with this.
Sharp: v0.27.0 Electron: v11.1.1
My code is as simple as:
await sharp('test.png').toFile('test.jpg');
I get a
[basic_string::_S_construct null not valid]
error when loading a PNG file as described in https://github.com/lovell/sharp/issues/2356#issuecomment-744371757. I believe the linked comment was never fixed - only the GLib-GObject:ERROR (0.26.0) one was, so I decided to reopen this issue as a new and separate issue.To summarize: