Issue #5 describes missing "provides" information in the perl rpm that rpmcpan builds.
This Issue describes a heisenbug that I'm seeing in which the perl binary is built with an incorrect rpath and fails to find libperl.so.
Feel free to read through it if you'd like, but after wrestling with in the background all weekend, I think that it's either an OS/tools problem or an Openstack/nebula problem. I'm raising it anyway in case it comes up again, but I think you might want to just close it. Executive summary at the end.
I've been working on an Openstack based cluster (from Nebula) with a Centos 6 image at work and I've been seeing variations of the problem described below all weekend.
I moved the Provides lines in etc/perl.spec up underneath the Requires(postun) line (issue #5) and was able to build an rpm that included that information, so I patted myself on the back and moved on.
While doing more builds (localizing other things in the spec) I suddenly ended up with a perl that was unable to run because it couldn't find libperl.so. Using LD_DEBUG I can see that the rpath is incorrect (this output is from a recent failed experiment, not the original failure, but it's representative):
[...]
- Collecting and installing perl RPMs
Preparing... ########################################### [100%]
1:perl518 ########################################### [100%]
Perl 5.18.4 built; restarting with that version
/opt/perl518/bin/perl: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory
[centos@perl-builder rpmcpan.gne]$ LD_DEBUG=libs /opt/perl518/bin/perl
5924: find library=libperl.so [0]; searching
5924: search path=/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/tls/x86_64:/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/tls:/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/x86_64:/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE (RPATH from file /opt/perl518/bin/perl)
5924: trying file=/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/tls/x86_64/libperl.so
5924: trying file=/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/tls/libperl.so
5924: trying file=/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/x86_64/libperl.so
5924: trying file=/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE/libperl.so
5924: search cache=/etc/ld.so.cache
5924: search path=/lib64/tls/x86_64:/lib64/tls:/lib64/x86_64:/lib64:/usr/lib64/tls/x86_64:/usr/lib64/tls:/usr/lib64/x86_64:/usr/lib64 (system search path)
5924: trying file=/lib64/tls/x86_64/libperl.so
5924: trying file=/lib64/tls/libperl.so
5924: trying file=/lib64/x86_64/libperl.so
5924: trying file=/lib64/libperl.so
5924: trying file=/usr/lib64/tls/x86_64/libperl.so
5924: trying file=/usr/lib64/tls/libperl.so
5924: trying file=/usr/lib64/x86_64/libperl.so
5924: trying file=/usr/lib64/libperl.so
5924:
/opt/perl518/bin/perl: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory
and if I look back through my terminal window's history (having run with -v -v -v) I can find that perl was indeed linked with that rpath.
What's crazy is that in this situation, the only file in the BUILD dir that contains the string "/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE'" is "myconfig"
which I believe is not actually used for anything in the build. Everything else seems to have use the path that's edited in the %build step that edits config.sh in place before calling make.
I've thrashed around for a while testing various hypothesis and trying to isolate/reproduce what I'm seeing. My general setup looks like this (see Issue #3 for details):
In order to get rpmcpan to work, the system perl seems to need a handful of things, which I've provided by populating and using a local library like this:
which (this heisenbug not withstanding) will build an rpm and populate /opt/perl518 and then crash when restarting with the newly built perl because there are things in local.system that are /usr/bin/perl specific. I've solved that by buidling a local.opt for the newly built perl and running rpmcpan again using the local.opt content (see Isssue #3 for details).
While chasing the heisenbug I clean up between builds by running this little script:
I started off thinking that maybe the Provides lines needed to be somewhere else in the file, which led to heisenbug behavior. Then I thought maybe one of them was causing a problem (some weird interaction with some C macro somewhere or ...) and tried commenting out various combos, still more heisenbug behavior. At one point I commented out the actual execution of the tests to speed things up, but still heisenbug behavior.
I'm trying to work at this systematically, here's an odd thing that I've noticed.
If I have a build that has generated a perl with an incorrect rpath and I comment out all three Provides lines;
The next build fails, so I clean up and ...
The build after that fails, so I clean up and ...
The build after that fails, so I clean up and ...
The build after that succeeds.
even though I've made no changes (other than the "cleanup" described above). Before I started taking careful notes I believe that I saw the recovery on the 4th step, one iteration shorter.
And now that I've been taking notes, I've become convinced that this is some sort of platform bug. Two things persuade me of that:
I can't reproduce it in a Centos vagrant/virtualbox vm running on my mac
I've now seen failures/recoveries after 2 iterations, 3 iterations and 4 iterations.
For now, I'm going to go with rpm's built in a vagrant vm and maintain a bit of skepticism when using the nebula system.
Issue #5 describes missing "provides" information in the perl rpm that rpmcpan builds.
This Issue describes a heisenbug that I'm seeing in which the perl binary is built with an incorrect rpath and fails to find libperl.so.
Feel free to read through it if you'd like, but after wrestling with in the background all weekend, I think that it's either an OS/tools problem or an Openstack/nebula problem. I'm raising it anyway in case it comes up again, but I think you might want to just close it. Executive summary at the end.
I've been working on an Openstack based cluster (from Nebula) with a Centos 6 image at work and I've been seeing variations of the problem described below all weekend.
I moved the
Provides
lines inetc/perl.spec
up underneath theRequires(postun)
line (issue #5) and was able to build an rpm that included that information, so I patted myself on the back and moved on.While doing more builds (localizing other things in the spec) I suddenly ended up with a perl that was unable to run because it couldn't find
libperl.so
. Using LD_DEBUG I can see that the rpath is incorrect (this output is from a recent failed experiment, not the original failure, but it's representative):and if I look back through my terminal window's history (having run with
-v -v -v
) I can find that perl was indeed linked with that rpath.What's crazy is that in this situation, the only file in the BUILD dir that contains the string "/opt/perl518/lib/5.18.4/x86_64-linux-thread-multi/CORE'" is "myconfig"
which I believe is not actually used for anything in the build. Everything else seems to have use the path that's edited in the
%build
step that editsconfig.sh
in place before callingmake
.I've thrashed around for a while testing various hypothesis and trying to isolate/reproduce what I'm seeing. My general setup looks like this (see Issue #3 for details):
In order to get rpmcpan to work, the system perl seems to need a handful of things, which I've provided by populating and using a local library like this:
then I build like this:
which (this heisenbug not withstanding) will build an rpm and populate /opt/perl518 and then crash when restarting with the newly built perl because there are things in
local.system
that are/usr/bin/perl
specific. I've solved that by buidling a local.opt for the newly built perl and running rpmcpan again using the local.opt content (see Isssue #3 for details).While chasing the heisenbug I clean up between builds by running this little script:
and also
I started off thinking that maybe the Provides lines needed to be somewhere else in the file, which led to heisenbug behavior. Then I thought maybe one of them was causing a problem (some weird interaction with some C macro somewhere or ...) and tried commenting out various combos, still more heisenbug behavior. At one point I commented out the actual execution of the tests to speed things up, but still heisenbug behavior.
I'm trying to work at this systematically, here's an odd thing that I've noticed.
even though I've made no changes (other than the "cleanup" described above). Before I started taking careful notes I believe that I saw the recovery on the 4th step, one iteration shorter.
And now that I've been taking notes, I've become convinced that this is some sort of platform bug. Two things persuade me of that:
For now, I'm going to go with rpm's built in a vagrant vm and maintain a bit of skepticism when using the nebula system.