spk121 / guile-gi

Bindings for GObject Introspection and libgirepository for Guile
GNU General Public License v3.0
58 stars 7 forks source link

Guix, grafts, gee… #96

Open LordYuuma opened 3 years ago

LordYuuma commented 3 years ago

Part of our design goals seems to be building against one version of GLib/GObject/GI, while allowing users to load any version. Today I am here to tell you, that this is extremely broken.

Why Guix?

Guix is just the messenger, but there is a stronger reason behind why GI appears rather broken in any Guix package, and that is grafts.

What are grafts?

Grafts are Guix' way of not rebuilding the world when an important security is rolled out. Basically, they allow you to build and link against old versions of a library while running the program against a new one. Traditional distros do that all the time and you don't even notice, but on Guix you actually have two versions of that library still lying around. The ungrafted one and the grafted one.

Why is this an issue?

Because it is possible to get those two mixed up, e.g. in guix environment. I am not sure, which use cases are affected, but that one surely is. To see the difference, run

./configure --without-gir-hacks
make clean check

once inside a guix environment with grafts, and once in one without them. If you want to use Guix environments to prototype your applications, that means you'll have to use --no-grafts to work around these types of issues for now.

What to do from now on?

It is pretty clear to me, that the main culprit here is a different version of GLib being linked to Guile-GI than the one that should be loaded through Guile-GI. To fix that, we'll probably have to overhaul our entire bootstrapping procedure starting at GTypes. And we'll likely have to preload some version of GObject before defining them. Much fun.

Workaround

If you are working on Guile-GI code inside guix environment and do not wish to be haunted by this issue and how to perhaps resolve it, for the time being add --with-gir-hacks to your invocation of ./configure. If you are experiencing similar issues in your own GI-based projects, consider patching your GIRs in a manner similar to what we do.

ZelphirKaltstahl commented 3 years ago

I've only followed this repo on the side and I don't really understand much about it. I'm only interested in building nice GTK apps using GNU Guile at some point. That said, since you already started explaining, let me ask some uninformed, possibly dumb questions : )

LordYuuma commented 3 years ago
daym commented 3 years ago

Wait, shouldn't Guix grafting also graft guile-gi (it should actually edit the shared objects in the derivation, too)? Then everything would be fine.

Does this problem actually happen when using an installed/dependent guix guile-gi package?

Right now I'm always using a manual git checkout of guile-gi, so that can't be grafted of course (because guix doesn't know that that checkout exists). Also, I recompile it very rarely, even as I update guix (read: it's stale).

I don't want to be the one responsible for a lot of unnecessary rework. Please make sure it's actually required and I've not been making the problem seem worse than it is.

LordYuuma commented 3 years ago

You are not the only one responsible for this. I use Guix myself as a basis for developing this library and am very weirded out by having to resort to such hacks.

There is so far no precedence for Guile-GI packages in Guix, which might also have to do with the fact, that the guile-gi recipe on Guix was rather broken for a long time (is it fixed now? I don't remember). I would assume some weird workaround would be required to get them to run as with all the PyGI and GJS packages.

daym commented 3 years ago
$ cat run
#!/bin/sh
exec ${HOME}/src/guile-gi/guile-gi-dannym/guile-gi/tools/uninstalled-env guix repl -L . "$@"
#exec guix environment -l ${HOME}/src/guile-gi/guile-gi-dannym/guile-gi/guix.scm --ad-hoc guile gdk-pixbuf adwaita-icon-theme shared-mime-info -- "$@" ${HOME}/src/guile-gi/guile-gi-dannym/guile-gi/tools/uninstalled-env guix repl a.scm
 dannym@dayas ~/src/guix-gui$ strace -f ./run main.scm 2>&1 |grep open  |grep glib |grep 'libglib.*\.so' |grep -v -- '-1'
[pid  5665] openat(AT_FDCWD, "/gnu/store/dp5l10lbgh66ap4idqvmkfms1qgjsj4r-profile/lib/libglib-2.0.so.0", O_RDONLY|O_CLOEXEC) = 15
[pid  5665] openat(AT_FDCWD, "/gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libglib-2.0.so.0", O_RDONLY|O_CLOEXEC) = 16

In order to debug this, I'd LD_PRELOAD something that overwrites open and prevents one of them from opening. That way, hopefully the requestor will fail and then we know who it is:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <string.h>
#include <stdio.h>

typedef void *(*dlopen_t)(const char *filename, int flags);

void *dlopen(const char *filename, int flags) {
        void* result;
        dlopen_t dlopen = dlsym(RTLD_NEXT, "dlopen");
        if (filename && strstr(filename, "xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libglib-2.0")) {
                fprintf(stderr, "dlopen %s\n", filename);
                return NULL;
        }
        result = dlopen(filename, flags);
        return result;
}
LordYuuma commented 3 years ago

I'm sorry to say that, but I don't get any meaningful results (or even results at all) from adding this to LD_PRELOAD. It doesn't even appear to execute at all.

daym commented 3 years ago

I do. Result: dlopen is sometimes called without full path (for example: libcairo-gobject.so.2)! if that is a string literal in some executable file, that is not good--because those references won't be found by the grafter.

New version (inside ~/src/guile-gi/guile-gi-dannym/guile-gi):

Create block-open.c:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

typedef void *(*dlopen_t)(const char *filename, int flags);

void *dlopen(const char *filename, int flags) {
        void* result;
        dlopen_t dlopen = dlsym(RTLD_NEXT, "dlopen");
        fprintf(stderr, "dlopen %s\n", filename);
        if (filename && strstr(filename, "/") == NULL && strstr(filename, "libcairo-gobject.so")) {
                fprintf(stderr, "dlopen blocked %s\n", filename);
                result = dlopen(filename, flags);
                if (result) {
                        fprintf(stderr, "and would have been found! Aborting!\n");
                        int* x = 0;
                        *x = 5;
                        abort();
                }
                return NULL;
        }
        result = dlopen(filename, flags);
        return result;
}

Then compile it via gcc -fPIC -shared -o block-open.so block-open.c.

Then create run scipt:

#!/bin/sh -e

LD_PRELOAD="$PWD/block-open.so" guix environment --preserve=LD_PRELOAD -l guix.scm --ad-hoc guile gdk-pixbuf adwaita-icon-theme shared-mime-info -- make tools/uninstalled-env tools/run-guile "$@"

I also edited tools/run-guile bottom to say:

exec ${top_builddir}/tools/uninstalled-env ${top_builddir}/libtool --mode=execute \
     -dlopen ${top_builddir}/libguile-gi.la \
     gdb --args "/gnu/store/0l5a4vx5w8xv6xwq7a6s7hc4r1790lvl-profile/bin/guile" "$@"

Then LD_PRELOAD=$PWD/block-open.so ./run examples/button1.scm.

Then r.

I got this:

[...]
dlopen libcairo-gobject.so.2
dlopen blocked libcairo-gobject.so.2
and would have been found! Aborting!
(gdb) bt
#0  0x00007ffff7fc724c in dlopen () from /home/dannym/src/guile-gi/guile-gi-dannym/guile-gi/block-open.so
#1  0x00007ffff3a4d7e9 in g_module_open () from /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgmodule-2.0.so.0
#2  0x00007ffff3beaac1 in g_typelib_symbol () from /gnu/store/dp5l10lbgh66ap4idqvmkfms1qgjsj4r-profile/lib/libgirepository-1.0.so.1
#3  0x00007ffff3be4475 in g_registered_type_info_get_g_type () from /gnu/store/dp5l10lbgh66ap4idqvmkfms1qgjsj4r-profile/lib/libgirepository-1.0.so.1
#4  0x00007ffff3c1f350 in gig_type_meta_init_from_type_info (meta=meta@entry=0x4ccdf0, type_info=type_info@entry=0x4c7540) at src/gig_data_type.c:231
#5  0x00007ffff3c1f72a in gig_type_meta_init_from_arg_info (meta=0x4ccdf0, ai=0x4c6cf0) at src/gig_data_type.c:33
#6  0x00007ffff3c23375 in arg_map_apply_function_info (func_info=0x4cb370, amap=<optimized out>) at src/gig_arg_map.c:108
#7  gig_amap_new (name=name@entry=0x4cc6c0 "container:propagate-draw", function_info=function_info@entry=0x4cb370) at src/gig_arg_map.c:69
#8  0x00007ffff3c26d83 in create_gsubr (specializers=0x7fffffffcb28, formals=0x7fffffffcb20, optional_input_count=0x7fffffffcb1c, required_input_count=0x7fffffffcb18, self_type=0x7ffff353d580, name=0x4cc6c0 "container:propagate-draw", function_info=0x4cb370) at src/gig_function.c:377
daym commented 3 years ago

Reading the source code of gobject-introspection, they do _g_typelib_do_dlopen in order to actually dlopen (that was inlined).

Aaaand that was patched by Guix.

          /* 'gobject-introspection' doesn't store the path of shared
             libraries into '.typelib' and '.gir' files.  Shared
             libraries are searched for in the dynamic linker search
             path.  In Guix we patch 'gobject-introspection' such that
             it stores the absolute path of shared libraries in
             '.typelib' and '.gir' files.  Here, in order to minimize
             side effects, we make sure that if the library is not
             found at the indicated path location, we try with just
             the basename and the system dynamic library
             infrastructure, as per default behaviour of the
             library. */
          module = load_one_shared_library (shlibs[i]);
          if (module == NULL && g_path_is_absolute (shlibs[i]))
            {
              module = load_one_shared_library (g_basename(shlibs[i]));
            }

I suspect it gets into the if block body.

A kingdom for a g_debug in there...

LordYuuma commented 3 years ago

I think I have something simpler:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

typedef void *(*dlopen_t) (const char *filename, int flags);
#ifndef DLOPEN_BREAK_KEY
#define DLOPEN_BREAK_KEY "gobject"
#endif

void *dlopen(const char *filename, int flags) {
  fprintf (stderr, "dlopen %s", filename);
  dlopen_t dlopen = dlsym (RTLD_NEXT, "dlopen");
  if (strstr (filename, DLOPEN_BREAK_KEY)) asm volatile ("int $03");
  return dlopen (filename, args);
}

That allows you to GDB into the dlopens of any particular shared library simply by defining DLOPEN_BREAK_KEY. And as expected, it gets called via g_module_open in the first load-info of test/insanity.scm.

daym commented 3 years ago

Good idea!

I've read through gobject-introspection source code by now and it seems that gobject-introspection upstream take it upon themselves to provide gir files for a few other libraries (like cairo--see gir/cairo-1.0.gir.in in gobject-introspection.

But if I understand the patch that Guix did to gobject-introspection's build system (not pasted above) correctly, then they only pick up libraries in the output of the package currently built. Well, currently we are building gobject-introspection, not cairo. So it won't pick up cairo. That's why it's in there with a relative path (a fallback of the guix patch).

The reason why it then doesn't fail nicely at startup of guile-gi as it should is because I have cairo in my profile (it was probably propagated by something--I can't remove it).

Other gir files provided in gobject-introspection are: DBus DBusGLib fontconfig freetype2 gio GL libxml2 Vulkan win32 xfixes xft xlib xrandr--so those will cause trouble eventually if they refer to glib or any other gobjects (some of those definitely do. If such a package is additionally propagated into a profile, you are gonna have a hell of a time finding the problem--as we did).

It's possible to manually specify --fallback-library-path= (presumably on g-ir-scanner), so it could be a workaround to make gobject-introspection depend on an union of the respective packages (see above), and then specify --fallback-library-path manually.

In any case, I think this is a Guix bug (at least additionally).

LordYuuma commented 3 years ago

This is somewhat off-topic, but I think the issue becomes clear if you do the following:

$ ls -l $GUIX_ENVIRONMENT/lib/libgobject-2.0.so
$ grep libgobject $GUIX_ENVIRONMENT/share/gir-1.0/GObject-2.0.gir

Perhaps Guix fails to graft typelibs at all? In that case, we might as well file an upstream bug, but the fundamental issue remains, that we can't load any GObject other than the one we're linking against.

Interestingly guix build glib and guix build glib --no-grafts also return different results from the two paths listed above. There is probably a glib-minimal for building packages, so even if we were to fix the grafting issue, the general issue still remains.

daym commented 3 years ago

What does

readlink $GUIX_ENVIRONMENT/share/gir-1.0/GObject-2.0.gir

say?

FWIW, for me, there seem to be references to absolute paths of libgobject in the typelibs (and the girs):

~/x/gobject-introspection-1.62.0/compile/gir$ strings  GObject-2.0.typelib  |grep libgob
/gnu/store/xkfc1275h55ynpgfr3wwmzy9707nblwc-glib-2.62.6/lib/libgobject-2.0.so.0
$ strings `guix build gobject-introspection`/lib/girepository-1.0/GObject-2.0.typelib |grep libgobject
/gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgobject-2.0.so.0

I'm 100% sure I found how the weird reference got in. The problem is making Guix find the problem automatically.

It's because:

(1) I don't update my user profile often (2) guix environment "updates" a lot (3) gobject-introspection refers to cairo by relative name (to dlopen); reason: cairo is not a dependency to gobject-introspection; and even if it was, the Guix patch to gobject-introspection as it is now wouldn't pick it up anyway. (4) cairo uses gobject too nowadays (!!!!!), and thus refers to (another) glib (5) cairo was propagated into my user profile ages ago, and is stale (6) There's a cairo reference and thus gobject-introspection loads the cairo library (with the RELATIVE name, see above). It gets the one from (5). That refers to ANOTHER glib. See above.

Next I'm trying to fix Guix's gobject-introspection to FAIL instead of embedding relative references if built inside a guix build container.

LordYuuma commented 3 years ago

Hmm, it appears the gir inside $GUIX_ENVIROMNENT is already grafted and grafting does actually change stuff, but it's still a different gobject shlib.

Your venture into the depths of cairo is nice and all, but keep things simple and use test/insanity.scm.

daym commented 3 years ago

The point is that the "depth of cairo" loaded another version of glib. So that's where another version of glib comes from.

I'm trying test/insanity.scm now--I only now saw the new file. Thanks.

LordYuuma commented 3 years ago

The thing is, you don't need cairo to load a different version of GLib. It already happens with GObject alone.

daym commented 3 years ago

I agree.

But what guix's gobject-introspection does is a wrong thing in general, and libcairo-gobject is being loaded by guile-gi right now (note: I did not directly use cairo anywhere). That is not going to end well.

Fixing it might fix this problem and other potential problems, too.

(I tried making gobject-introspection depend on cairo by now. That causes a circular reference. Sigh.)

In any case, trying insanity.scm now.

daym commented 3 years ago

Ohhh, libguile-gi links to libgobject at compile time, too. Well, that's gonna be a problem if that library is different to the one dlopen'ed...

LordYuuma commented 3 years ago

You'll get cairo through GTK, Pango, and many other graphical stuff and the recipe seems to contain gtk+ even though it is afaik not strictly necessary.

I highly doubt you'll find a fix to this in Guix. It is likelier that whatever change you make breaks something in PyGI or GJS instead, so please be cautious. A patch to dynamically link libguile-gi against GObject etc. would be very welcome on the other hand.

daym commented 3 years ago

I highly doubt you'll find a fix to this in Guix.

Right now I'll settle for a procedure to flag this problem in Guix--nevermind fixing it.

A patch to dynamically link libguile-gi against GObject etc. would be very welcome on the other hand.

I don't know at all how something like that would look.

gobject-introspection seems to require glib (see below). Things it loads then also depend on glib, but not necessarily on the same version. That is not going to end well.

ldd /gnu/store/64xq4j8b181s6yz7gpg4w8ny3i6r6irk-gobject-introspection-1.62.0/lib/libgirepository-1.0.so |grep glib-
libglib-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libglib-2.0.so.0 (0x00007fb5b5c8f000)
libgobject-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgobject-2.0.so.0 (0x00007fb5b5c30000)
libgmodule-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgmodule-2.0.so.0 (0x00007fb5b5c29000)
libgio-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgio-2.0.so.0 (0x00007fb5b5a5b000)

I don't think that this can be fixed in guile-gi. It can be fixed in gobject-introspection (by changing its architecture), I guess.

LordYuuma commented 3 years ago

ldd shows dynamic links, or is this a static link? In that case, having libguile-gi linked statically against GLib is a problem. Though either way dynamic linking would just be a crutch. The real issue is that we as libguile-gi mix our GLib into the GLib that the user actually wants to load and that's wrong.

daym commented 3 years ago

ldd shows what would be loaded when loading this so. It can only show things that are known without actually executing user code of that so. That means it shows things that are in the header of that so. But the loader ld.so will load those using dlopen when loading that so, in the course of running an executable. Guix uses ld's rpath option in order to make sure those headers always contain full paths.

The real issue is that we as libguile-gi mix our GLib into the GLib that the user actually wants to load and that's wrong.

So does gobject-introspection, and that's just as wrong.

daym commented 3 years ago

In the end this is the usual recursive definition problem. I don't get why people always do that--that has to lead to problems sooner or later.

For example having a compiler written in the language of that compiler, or xslt (which is a language for transforming xml) specs itself written in XML etc.

In this case, gobject-introspection is supposed to make glib usable in target languages other than C.

In a sane world that would mean that a binding generator doesn't use glib as a non-native input. Or if it absolutely has to (it really shouldn't [1]), then at least it wouldn't expose that glib or any of its contents to target language users (because philosophically, that's just wrong--even if it happens to work sometimes), i.e. it should be a native-input.

But no, gobject-introspection has glib as a regular input. (Trying to move gobject-introspection to native-inputs, I get build system meson does not support cross-compilation--see https://issues.guix.gnu.org/44244 )

If it has to do stuff like that, you'd think at least it would have two levels: a meta-level where it mangles definitions of glib, and that just happens to use glib for the mangling internally (but not ever expose that glib or any of its objects directly to the user), and another normal level where it actually provides glib to the target language. But no :(

Personally I'd write a replacement for libgirepository that just parses the girs or typelibs on its own for guile-gi (not using glib or girepository.so to do it). Then it's much simpler--both practically and philosophically.

Because what gir files do is describe glib at a C level. The interface of the glib is gobjects, but the implementation of glib is C (Pascal got this right in 1970, only to be ignored by almost everyone). So there's no reason for a binding generator to depend on glib at all--it would have to use the gobject interface if it did.

Of course there'll be some "syntactic sugar" provided for the target language depending on the library when it is loaded--but that's pretty much it. I remember pygtk doing this right ("override" files) a long long time ago.

An easy way to catch those problems early-on is to try to cross-compile your program. If it tries the self-reference outlined above, that would cause a failure (which is what you want) because the architectures of the parts don't match.

[1] because eventually it will become part of glib, and what then?

spk121 commented 3 years ago

I'm not sure if this adds any useful information to this discussion, but, on Linux, you can pull the results of dlopening in the following fashion. Compile this with -ldl

#define _GNU_SOURCE
#include <link.h>
#include <stdlib.h>
#include <stdio.h>

static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
    printf("Name: \"%s\" (%d segments)\n", info->dlpi_name,
           info->dlpi_phnum);
    return 0;
}

int
main(int argc, char *argv[])
{
    void *handle;
    handle = dlopen("libgtk-3.so", RTLD_LAZY);

    dl_iterate_phdr(callback, handle);

    exit(EXIT_SUCCESS);
}
spk121 commented 3 years ago

Personally I'd write a replacement for libgirepository that just parses the girs or typelibs on its own for guile-gi (not using glib or girepository.so to do it). Then it's much simpler--both practically and philosophically.

To guess the level of effort, I built libgirepository without linking to glib. I get 250 glib procedures that would need stubs. But guile-gi doesn't use the whole of libgirepository, so that's an overbound. But say we circumvented linking to glib, would the underlying libffi dependency cause similar problems?

My rough estimate was done this way, after removing glib from meson.build

ninja 2>&1 | awk -F '`' '{print $2}' | awk -F "'" '{print $1}' | sort | uniq | grep g_ | wc
LordYuuma commented 3 years ago

Not as far as I know, but I'm personally not convinced that this is the right move here. Philosophically speaking, it wouldn't be much of an introspection if one was doing it from the outside, would it?

I've had a short look at G-Golf and they seem to be doing stuff similar to us in that they partly export the base GLib that they get, but I assume this is fine for them, since they use dynamic links for GObject. That being said, I still don't have any experience with using G-Golf as a library, but there seem to be projects built on it in Guix, so it appears likely to work out that way.

TL;DR: Dynamic linking would probably be at least a short-term solution. For the long term we should think about what "introspecting your twin" really means.

@daym Do you count GIBaseInfo being a registered struct type as "exposing GLib to the user"? Because that's pretty much the only thing I can think of that fits your description here.

daym commented 3 years ago

Philosophically speaking, it wouldn't be much of an introspection if one was doing it from the outside, would it?

It's doing it from the outside anyway because C has no introspection (and neither does glib, generally--except for small islands). They can be faking it, but that doesn't change this fundamental fact.

But I know what you mean: gobject-introspection itself wants to be a gobject.

But if gobject-introspection itself wants to be a gobject, then it should be made impossible to load another glib using gobject-introspection (making the rest of glib similar to what compiler or shell builtins would be).

@daym Do you count GIBaseInfo being a registered struct type as "exposing GLib to the user"? Because that's pretty much the only thing I can think of that fits your description here.

Yeah.

In the end I don't see how gobject-introspection can work reliably on Guix like that--nevermind guile-gi for the time being.

The easiest way to find out in detail what is what would be to remove glib from the dependencies of gobject-introspection (and their header files) entirely and stub the things below like written below (also remove #include <glib.h> and #include <glib-object.h> from the gobject-introspection public interface). Ideally, it should still build and install. Does it?

If not, add glib back to the package dependencies but remove #include <glib.h> and #include <glib-object.h> from the gobject-introspection public interface. That should definitely work (but would still be pretty bad). If not, that's definitely very bad.

Object-like glib interface types that even gitypelib-internal.h uses:

typedef struct _GMappedFile GMappedFile;
typedef struct _GList GList;
typedef struct _GITypelib GITypelib;
typedef int GQuark; // this one is even returned as a NON-pointer
typedef struct _GError GError;
typedef struct _GIBaseInfo GIBaseInfo;

And the public interface of gobject-introspection has:

GType                  g_base_info_gtype_get_type   (void) G_GNUC_CONST; // sigh...

Primitive glib interface types which are used and fine to use since they have obvious definitions and shouldn't change (and could be just be defined manually):

#define G_BEGIN_DECLS
#define G_END_DECLS
#define GI_AVAILABLE_IN_ALL

typedef char gchar;
typedef unsigned char guchar;
typedef unsigned char guint8;
typedef unsigned short guint16;
typedef unsigned int guint32;
typedef unsigned int guint;
typedef int gint; // for gboolean
typedef signed char gint8;
typedef int gint32;
typedef unsigned long gsize;
typedef gint gboolean;

I'm all for dynamically loading stuff from glib in guile-gi, but I just want to make sure first that this actually fixes the entire problem. Otherwise the problem will be back, one library over there.

(Also, the "cairo" problem won't vanish: gobject-introspection totally can load yet another glib version when traversing through to cairo, even after all those fixes. And it does traverse there. That could also happen with other libraries like gnome highlevel libs etc. I don't want to single glib and cairo out--it's just an example)

All in all, I wonder if GNOME can be made to work reliably like Guix wants it to at all. After all, gobject.so has a central type registry and thus having two gobject.so loaded (even indirectly) in the same executable is going to make things weird...

spk121 commented 3 years ago

Would requiring guile-gi to statically link to a static version of libglib help?

daym commented 3 years ago

I think it wouldn't help.

If you did do it (and also did it in gobject-introspection), care would have to be taken not to pass any gobject-introspection-internal things (like GITypeInfo) as gobjects to the scheme user or to any of the dlopened libraries at runtime (it would always have to be wrapped and unwrapped so it's doesn't scare them :P).

But guile-gi then still shouldn't use any static glib type conversion functions to resolve dynamically-loaded objects (for example to convert to gobject interfaces)--so it wouldn't help much.

If anything it would just hide the problem and now make it impossible to debug even with a LD_PRELOAD.

I think that the goal should be to either

(a) remove the explicit glib depdendency from both gobject-introspection and guile-gi, or (b) make the glib dependency built-in to both gobject-introspection and guile-gi (i.e. don't dynamically-load another glib even if instructed to do so)

I think that we need help from Guix on how this is supposed to be fixed.

LordYuuma commented 3 years ago

Manual require and load depends on being able to export GIBaseInfos and dealing with them in Scheme code. And since typelib->module is implemented on top of them… you get the idea. Also statically linking more stuff is ethically wrong, as Guix would require us to statically link less. Not to mention, that static linking also has security implications on other distros.

But if gobject-introspection itself wants to be a gobject, then it should be made impossible to load another glib using gobject-introspection (making the rest of glib similar to what compiler or shell builtins would be).

Just because we aren't doing it right, doesn't mean that it's impossible. We just have to stop mixing our (not theirs) internal GObject into the stuff we return. If anything is missing in order to do that, it probably some macro, that hasn't been introspected.

In the end I don't see how gobject-introspection can work reliably on Guix like that--nevermind guile-gi for the time being.

You load GObject.Object et al. through require and load instead of referring to G_TYPE_OBJECT. The same applies to GIRepository itself, but we can just load that internally from C code without batting an eye.

All in all, I wonder if GNOME can be made to work reliably like Guix wants it to at all. After all, gobject.so has a central type registry and thus having two gobject.so loaded (even indirectly) in the same executable is going to make things weird...

Weird maybe, but definitely manageable. Especially in Guile, where you could use e.g. (@@ (gi Typelib-Version) <GValue>). Also GI does not allow instantiating different versions of the same typelib. So if you ever end up with glib-x.y and glib-a.b both loaded through GI (not once through GI and once through the wrapper as we have it here), then either someone maliciously renamed one of them forming "TotallyNotGLib-2.0.gir" or you've found a pretty major bug in GI.

(b) make the glib dependency built-in to both gobject-introspection and guile-gi (i.e. don't dynamically-load another glib even if instructed to do so)

That's a rather interesting twist on my suggestion to dynamically link them to ensure, that they are the same under most circumstances. (Dynamic linkage against 2.0 and requiring 2.0 through typelib should yield the same result unless your system is broken beyond repair, the problem would then still manifest in 2.0 vs 3.0).

spk121 commented 3 years ago

Weird maybe, but definitely manageable. Especially in Guile, where you could use e.g. (@@ (gi Typelib-Version) <GValue>). Also GI does not allow instantiating different versions of the same typelib. So if you ever end up with glib-x.y and glib-a.b both loaded through GI (not once through GI and once through the wrapper as we have it here), then either someone maliciously renamed one of them forming "TotallyNotGLib-2.0.gir" or you've found a pretty major bug in GI.

(b) make the glib dependency built-in to both gobject-introspection and guile-gi (i.e. don't dynamically-load another glib even if instructed to do so)

That's a rather interesting twist on my suggestion to dynamically link them to ensure, that they are the same under most circumstances. (Dynamic linkage against 2.0 and requiring 2.0 through typelib should yield the same result unless your system is broken beyond repair, the problem would then still manifest in 2.0 vs 3.0).

So here's another half-baked idea. ;-) If dynamic linking and requiring through typelib yields the same results, another brute force approach could be to actually replace any GLib/ GObject calls found in guile-gi itself with function pointers found (lazily) at runtime using dlopen and dlsym calls. And then allow a guile-gi client the option to pass in a path to the version of the GLib/GObject library that is desired as a run-time initialization parameter for dlopen. In essence each function call would become

GSList *(*_g_slist_append)(GSList *lst, void *data) = NULL;
GSList *g_slist_append(GSlist *lst, void *data)
{
  if (_g_slist_append == NULL)
    _g_slist_append = dlsym(libglib, "g_slist_append");
 return _g_slist_append(list, data);
}
LordYuuma commented 3 years ago

That is roughly what I had in mind, yes. There are a few caveats, however:

  1. This must not only be done for functions, but also macros, e.g. G_TYPE_VALUE.
  2. We should make it so that require already serves that purpose. Since require has no return value, it should be safe to call during bootstrapping.
  3. We would need to rearrange our modules to not export anything GObject-related before the first require call is made. This has some implications on (gi oop) and (gi types). In fact, rather than exposing types from there, I think the correct method should be loading them through require/load-by-name.

Rather than manually calling dlopen, it might also be possible to do that through GI, ensuring that there won't be any discrepancy. Dynamic linking would in my opinion just be a workaround until we have reworked our bootstrapping in such a manner.

daym commented 3 years ago

Yes, making it so our interface into gobject stuff is only gobject-introspection's user-level interface would make this work a lot more. However, not everything will work:

I've just found another way this bites us: Using (make <GtkButtonBox>) and putting a (button:new-with-mnemonic) into it . Because the former is done by libgobject directly (using g_object_new), but the latter is done by gtk directly, which then uses (another) libgobject to do it. Hence the objects are not compatible because they are in another gobject registry, respectively.

In order to find things like that, when loading a gi typelib, I would suggest to check the transitive closure of shared objects we want to load. If the transitive closure contains a library basename that is already loaded, for now, log an error. Later on, whatever deduplication/translation we could do would go there, I guess.

Weird maybe, but definitely manageable. Especially in Guile, where you could use e.g. (@@ (gi Typelib-Version) ).

I think the thing above can make even libgtk confused--nevermind guile-gi.

LordYuuma commented 3 years ago

This would be fixed by using libgobject indirectly. (We can get the shlib through g_irepository_get_shared_library, if it is NULL simply assume everything is fine?)

After #90 we don't really need transitive closures for anything. If you mess up your require/load order, that's an error on your part, and if you use typelib->module everything should be fine. And yes, Gtk+ gets confused if you load the wrong GObject, it won't even start then.

daym commented 3 years ago

And yes, Gtk+ gets confused if you load the wrong GObject, it won't even start then.

It totally can (and should) happen that if a library A has an input Q, and library B has the same input Q, and you use A and B in your program, and then you (guix-) update just B to use Q' (a newer version of Q) and recompile your program, then you have both Q and Q' in your process address space! Q is an internal dependency of A, and by coincidence it's an internal dependency of B, too. Just because the internal dependency of B changed shouldn't change the internal depenency of A--that's what "internal" means. Otherwise the modules aren't really modular (because they then don't isolate from each other).

The question is whether gtk people ever considered this case. I think they didn't (otherwise there wouldn't be the type registry as a global variable in glib--they clearly think that the class names are unique), and that that won't work.

I think Guix should be made to check whether this is the case and error out.

Unfortunately, the profile does not contain the gir files, only the typelibs. That would mean that a profile hook would have to load the typelibs using gobject-introspection in order to find out. Sigh...

While there are many profile hooks for glib stuff, there is none for "gi-repository" typelibs yet.

It is not reasonable to expect the end user to care about those things (programmer, fine), so automated functionality to support this case is in order (probably in guix; maybe in profile hooks, maybe on the host side--who knows).

I wonder how much dlopen can be made to do stuff that a regular linker does, though. Isn't most of the loading inside ld.so at runtime anyway? I take it that the duplicate global variable checking isn't, right?

LordYuuma commented 3 years ago

I think Guix should be made to check whether this is the case in a profile hook.

I don't think this can be meaningfully delegated to Guix. However, I do think Guix should provide meaningful typelib handling inside glib-or-gtk-build-system.

It totally can (and should) happen that if a library A has an input B, and library Q has the same input B and you use Q and A in your program, and then you update just Q to use B' (a newer version of B) and recompile your program, then you have both B and B' in your process address space! This is a kind of internal dependency.

Again, not with GI. In GI, (A, V) = (A, V) should hold over the duration of a program. Dependencies between typelibs are not internal by their very nature.

It is not reasonable to expect the end user to care about those things.

On the Guix side, the package maintainer would have to ensure, that the typelib paths are set up correctly, but there are different styles of doing that, some serving slightly different purposes. There are no clear answers to this either. My personal opinion is that everything in /bin should get hardcoded = typelib paths, but that of course only works for applications, which themselves do not need to expose typelib paths for others.

daym commented 3 years ago

Do you know whether libdl can be configured to check this? I mean the linker ld does duplicate global checking for regular linking--can't it do that for dlopen, too? Then the check would be automatic and also would work on non-guix.

Because libgobject has global variables, and if ld.so loads two different libgobjects into the same executable at runtime, why doesn't the dlopen fail because of duplicate globals?

LordYuuma commented 3 years ago

I don't think "globals" still have that meaning when it comes to dynamic linkage. Remember our dlopen experiments from earlier? Those wouldn't work if we were not allowed to overwrite dlopen in the first place.

daym commented 3 years ago

Ok, after more thinking and experimenting, I am convinced that this, for me, was me using guix environment wrong.

This https://gitlab.com/daym/guix-gui/-/commit/410e04f54da874cc89eea216ff0ae082bcf8c28b fixed everything for me!

There were many things conspiring to make this happen, among those:

If I use a normal guix.scm all this works JUST FINE!

I want to reiterate to please not unlaterally change guile-gi to dlsym all your glib functions. gobject-introspection doesn't do that either, so it won't help.

And it's unnecessary. When actually building or installing a package, then there (according to Leo Prikler on guix-devel) are never two different libgobjects in the dependency graph.

In my opinion what should be done, if possible, is to change gobject-introspection (probably upstream!) to make dlopen fail on duplicate globals. That would make the lives of people encountering this in the future much easier.

If you still experience problems, please tell us on guix-devel, though.

LordYuuma commented 3 years ago

This https://gitlab.com/daym/guix-gui/-/commit/410e04f54da874cc89eea216ff0ae082bcf8c28b fixed everything for me!

  • Some packages in guix have a user-facing package and a package actually used in package definitions in dependencies. In guix environment, it will pick the user-facing package even if the dependency graph already contains the other one. There will be NO warning emitted.

This is perhaps just a personal nitpick, but you should probably describe your development environment in scheme. Just create a guix-dev.scm, which adds a specific version of guile etc. to the package's inputs.

  • Not using --pure and having an old cairo in the user profile is a very bad idea.

I too am very confused whenever I see Gtk tests miraculously passing even if gtk+ is not an input to Guile-GI. Turns out the terminal has GI_TYPELIB_PATH set.

I want to reiterate to please not unlaterally change guile-gi to dlsym all your glib functions. gobject-introspection doesn't do that either, so it won't help.

I get where you're coming from, but this is a discussion to be had regardless of Guix. Still, waiting until there is an actually conflicting version of GObject to load through typelibs gives us a lot of time to discuss it.

And it's unnecessary. When actually building or installing a package, then there (according to Leo Prikler on guix-devel) are never two different libgobjects in the dependency graph.

Note, that they're "assuming correct grafting". This is not the case when building Guile-GI itself in a Guix environment (GLib and the GLib used by GI have different inputs and get grafted differently and I have no idea why), hence why you need --no-grafts to pass insanity.scm. For the record, this does work on your end with guile-gi as input to your package, does it?

In my opinion what should be done, if possible, is to change gobject-introspection (probably upstream!) to make dlopen fail on duplicate globals. That would make the lives of people encountering this in the future much easier.

As Leo Prikler pointed out, consumers of GI can check whether a typelib tries to add duplicate entries into the type registry. I tried consolidating those entries, but that doesn't turn out too well.

daym commented 3 years ago

This is perhaps just a personal nitpick, but you should probably describe your development environment in scheme. Just create a guix-dev.scm, which adds a specific version of guile etc. to the package's inputs.

Thanks! That makes sense.

I don't think a regular developer that just happens to use Guix would create even guix.scm though.

guix environment command line has some weird design decisions.

I too am very confused whenever I see Gtk tests miraculously passing even if gtk+ is not an input to Guile-GI. Turns out the terminal has GI_TYPELIB_PATH set.

Aha! That should not happen!

Filed bug 44414 in Guix about that.

For the record, this does work on your end with guile-gi as input to your package, does it?

It seems to work just fine. Note that I'm using ./run guix-gui.in, so I DO NOT even do guix package -i guix-gui or guix build guix-gui.

As Leo Prikler pointed out, consumers of GI can check whether a typelib tries to add duplicate entries into the type registry. I tried consolidating those entries, but that doesn't turn out too well.

If two libgobjects are loaded, the type registry is also duplicated. If loading libgobjects is unsupported, something should prevent it. Since it can also happen as an indirect dependency of a dlopen of something else, the right one to prevent it is the loader in the OS that dlopen uses.

LordYuuma commented 3 years ago

This is perhaps just a personal nitpick, but you should probably describe your development environment in scheme. Just create a guix-dev.scm, which adds a specific version of guile etc. to the package's inputs.

Yeah, seems to be much better.

I don't think a regular developer that just happens to use Guix would do that, though.

As an irregular developer, I discovered that option when I needed to use some rather unconventional Emacs packages for my development environments. ;)

guix environment command line has some weird design decisions.

No comment.

Aha! That should not happen!

Filed bug in Guix about that.

I don't think there's much Guix can do about that, since gnome-terminal is implemented using (I believe) python-gobject. That's just why --pure is even more important!

For the record, this does work on your end with guile-gi as input to your package, does it?

It seems to work just fine. Note that I'm using ./run guix-gui.in, so I DO NOT even do guix package -i guix-gui or guix build guix-gui.

That seems to severely limit this issue then. Please do inform us once you at least try guix environment --ad-hoc guix-gui.

As Leo Prikler pointed out, consumers of GI can check whether a typelib tries to add duplicate entries into the type registry. I tried consolidating those entries, but that doesn't turn out too well.

If two libgobjects are loaded, the type registry is also duplicated. If loading libgobjects is unsupported, something should prevent it. Since it can also happen as an indirect dependency of a dlopen of something else, the right one to prevent it is the loader that dlopen uses.

That's a cosmetic issue at best. If the expected GType according to your registry is different from what you get, you don't care what others out there say, do you?

daym commented 3 years ago

That seems to severely limit this issue then. Please do inform us once you at least try guix environment --ad-hoc guix-gui.

Sure, but I don't have a package guix-gui inside guix master repository. It will be a long time until I do.

The ./run uses something like guix environment --pure -l guix.scm, though.

That's a cosmetic issue at best. If the expected GType according to your registry is different from what you get, you don't care what others out there say, do you?

I have to think about it, but I think it's still a problem if two other libraries use a respective different glib. Then one registers in one and the other in the other and no matter which of the libgobjects your program uses to read a repo it won't be able to use both libraries it wanted to in the same process.

And to add to the previous topic:

guix could check and warn on same-name different-package inputs in the same profile on the host side declaratively. I've suggested that to Guix--let's see.

LordYuuma commented 3 years ago

That seems to severely limit this issue then. Please do inform us once you at least try guix environment --ad-hoc guix-gui.

Sure, but I don't have a package guix-gui inside guix master repository. It will be a long time until I do.

The ./run uses something like guix environment --pure -l guix.scm, though.

You can also do --pure --ad-hoc -l guix.scm, for the record. That should build guix-gui as Guix package and add it to your environment. Not sure if you have a proper launcher yet, but it's not a problem on the Guix side.

That's a cosmetic issue at best. If the expected GType according to your registry is different from what you get, you don't care what others out there say, do you?

I have to think about it, but I think it's still a problem if two other libraries use a respective different glib. Then one registers in one and the other in the other and no matter which of the libgobjects your program uses to read a repo it won't be able to use both libraries it wanted to in the same process.

That is correct, you can't mix the two. But for the purpose of Guile-GI, that does not matter. We would "simply" ignore anything in our local registry and use the one provided through GI via the methods GI provides.

And to add to the previous topic:

guix could check and warn on same-name different-package inputs in the same profile on the host side declaratively. I've suggested that to Guix--let's see.

Guix can actually already do wilder things. During union-builds – including profile generation – it checks for duplicate files and resolves them somehow. You just need to crank up its verbosity if you want to get the warnings, that arise from that.

LordYuuma commented 3 years ago

For the record, I tried testing using a custom Gir for tests and it passes most of the cases, but bind-property seems to have some type confusion going on. Not quite sure if that comes from us or GI directly.

LordYuuma commented 3 years ago

I think I just proved this to be our fault.

spk121 commented 2 years ago

Well, this has been sitting here for a while. As someone who rarely uses Guix, I have a simplistic understanding, but, I gather that a step toward improvement could be something like this:

LordYuuma commented 2 years ago

That would be a big step towards improvement, as far as I can see, yes. The open question (whether libgirepository allows us to load a GObject typelib other than the one it was linked against) does remain, but there's no way of answering that other than trying to implement such a thing.

spk121 commented 2 years ago

Well, I did do some hacking in a branch called type_init_noodle2

It doesn't solve the problem but it does take a couple of small steps in that direction

It does appear that to solve the problem, the right approach was as previously suggested: reimplementing GIRepository itself.

LordYuuma commented 2 years ago

As a stylistic choice, I think we ought to avoid "_pub.h" and instead put private implementation details into a _private.h, as is the normal GNOME convention. I haven't had too much of a look at the internals, but structurally speaking, this might be a improvement. Yet, as far as linking is concerned $(GOBJECT_INTROSPECTION_LIBS) $(GLIB_LIBS) $(GOBJECT_LIBS) means we're still at square 1.

spk121 commented 2 years ago

OK. From here, split the libguile-gi.so into two separate shared libraries: a preprocessor of typelib, and a runtime.

The preprocessor will link to libgirepository, parse typelibs, and output intermediate information for gig_type_define, gig_function_define, etc. This intermediate information is Scheme. I'll borrow the term IL for it. Later, this whole process could be replaced a GIR XML parser.

The runtime will only depend on libguile and libffi at link time, but use GLib and GObject via dlopen. It will load the IL, and use dlopen and libffi to construct all the necessary. This would mean

The number of GLib functions required for SCM/C arg conversion stands at 140 in my hacking branch, down from 179 in version 0.3.2.

nm --undefined-only .libs/libguile-gi.so | grep ' g_' | grep -v '_info_' | grep -v 'irepository'

This would require replacement of libgirepository functions