redchillipadi / ebuild-overlay

Ebuild for various packages not currently in the Gentoo tree
6 stars 2 forks source link

media-libs/osl-1.10.{8..10} failing #11

Closed waebbl closed 4 years ago

waebbl commented 4 years ago

I'm having a hard time getting one of the newer osl builds to compile. I tried several combinations of gcc (slots 8 and 10) and clang (slots 9 and 10, the latter by changing the 1.10.10 ebuild). I also tried downgrading boost to 1.72.0 and reinstalling jemalloc, with no success. The error i get is <jemalloc>: Error in dlsym(RTLD_NEXT, "pthread_create") in oslc, when building the emitter shader.

A quick search brought up this https://github.com/jemalloc/jemalloc/issues/907, but I'm not sure whether this is relevant.

I've now tried downgrading glibc to 2.30.0 in a clean chroot, rebuild world with gcc-10 and then it compiles. So the issue seems to be related to how glibc and jemalloc handle threading. Probably not even an issue with the ebuild. Maybe restricting to <=glibc-2.30 in the ebuilds will help with it.

Do have similar experience? Which glibc version did you use during your tests?

redchillipadi commented 4 years ago

I can install osl-1.10.8 and 1.10.10 from my overlay with FEATURES="test" USE="doc partio qt5 test" without issue. I just made a PR for the same so am keen to fix this before it is committed.

I have tested with gcc 8.4.0 and 9.3.0 and both work. While I am testing ebuilds for addition to the tree I am running a stable system (apart from my web browser) with clang 8.0.1, 9.0.1 and 10.0.0, boost 1.72.0-r1, jemalloc-5.2.1 and glibc-2.30-r8.

I can confirm that clang 10 will not work as I tested it during the ebuild development. Clang 10 support was added upstream to 1.11.5

I have never seen this error, but I am updating my second system at the moment, and will try some other combinations to test it out. Do you think it might be something in your original system that is different from the chroot? If you upgrade glibc in the chroot does the failure occur again?

waebbl commented 4 years ago

Yesterday I tried with default USE flag settings, i.e. none enabled, USE="partio", USE="partio qt5" and USE="doc partio qt5" and it succeeded with gcc-10.1.0, {clang,llvm}-9.0.1 and glibc-2.30-r8. I haven't tried with test USE flag, due to RESTRICT="test" in the ebuild.

Unfortunately I just noticed, there's no jemalloc installed in the chroot.

Do you think it might be something in your original system that is different from the chroot? If you upgrade glibc in the chroot does the failure occur again?

It even succeeds with glibc-2.31-r3 with and without jemalloc installed. The issue seems to be related to one of the libraries which get linked against oslc binary is also linked against jemalloc, maybe through an automagic dependency, as there are not many packages, as reported by equery depends which depend on jemalloc and only one, openvdb, which depends unconditionally on it.

Now I think, it's related to openimageio, which on my desktop is built with USE=openvdb, thus pulling in jemalloc, while in the chroot, I just use default settings, which does not contain openvdb. I will test by rebuilding openimageio with the openvdb USE flag enabled and see if this will reproduce the issue.

redchillipadi commented 4 years ago

I upgraded to glibc-2.31-r3 but can still compile osl without errors, so I downgraded back to 2.30-r8 and regenerated the toolchain.

I then removed the color-management, jemalloc, openvdb and osl use flags from everything and unmerged openvdb, opencolorio, openimageio and jemalloc completely from the system.

After this emerge osl reinstalls openimageio (without openvdb or jemalloc in the system) and still compiles osl without error. This libOpenImageIO.so.2.1 is not linked against openvdb.

I then enabled openimageio[openvdb] and emerge -1v openimageio, which installed jemalloc, openvdb and openimageio. Now libOpenImageIO.so.2.1 links to openvdb, and libopenvdb.so.7.0 links to libjemalloc.so.2. After this emerge -1v osl still completed successfully.

So I am not able to reproduce your error on my main system. My second system is still compiling but I will see how it goes when it recompiles osl and create a chroot for further tests once it is done.

Could you please send me the full info so I can try to reproduce your system in my chroot? eg emerge --info osl, the build.log etc. You can use b.g.o or email me if you like.

waebbl commented 4 years ago

I'm too not able to reproduce it in the chroot. It's successfully building with the options you have chosen as well. Yet it's still failing on my desktop environment.

I will check my (desktop) libraries for linking against jemalloc and see if this gives me more insight. My setup is quite complex, I'm having many individual USE flag settings enabled per package and overall almost 3k packages installed. If I don't find any clues, I'm going to open an issue on b.g.o. and link it here to continue researching this.

Thanks for your testing so far. You don't need to waste your time on this. I think, this might be some really weird issue, possibly related to the complexity of my setup. If I find something, I'm going to inform you about it.

waebbl commented 4 years ago

Uh this was going quick.... I now emerged jemalloc with USE="lazy-lock" in my chroot, and tried rebuilding osl, and this triggered the issue with osl.

It solves the issue in my desktop environment too. The problem now left, is how to enforce this USE flag to be disabled, when installing osl, as osl doesn't depend on jemalloc... I'm going to ask in IRC if someone has an idea.

waebbl commented 4 years ago

@asturm replied on IRC. One possibility would be to block the package with the USE flag in osl. Though not an elegant solution, it might help with warning users not to enable the USE flag, if they're going to install osl.

redchillipadi commented 4 years ago

I am thinking of adding the lazy-lock USE flag to openvdb, and the osl can depend on openvdb[-lazy-lock] to enforce the restriction.

waebbl commented 4 years ago

I'm not sure if that's a good idea. This will only pass the flag through to jemalloc, with openvdb having no direct use of the flag.

redchillipadi commented 4 years ago

I will just add jemalloc[-lazy-lock] to osl as you first suggested. If multiple packages have the same issue, then I will search for a more general solution later.

redchillipadi commented 4 years ago

I just tried compiling openimageio[doc, openvdb] and the oiiotool fails with : Error in dlsym(RTLD_NEXT, "pthread_create") when jemalloc has lazy-lock. So I think that openimageio will need the same fix as osl.

waebbl commented 4 years ago

Might this be due to oiio pulling in openvdb and thus jemalloc? I can't remember having this issue with oiio, but maybe I hadn't enabled the openvdb USE flags with it.

As it's more than one package, instead of depending on a package which isn't needed, maybe the approach of using a blocker as suggested on IRC would be a cleaner solution?

redchillipadi commented 4 years ago

So far all the issues are occuring through openimageio[openvdb], so I have added openvdb ? ( !dev-libs/jemalloc[lazy-lock] ) to openimageio RDEPEND which I think is what IRC suggested

waebbl commented 4 years ago

Yes I think, this is what was meant.