openhpc / ohpc

OpenHPC Integration, Packaging, and Test Repo
http://openhpc.community
Apache License 2.0
865 stars 191 forks source link

Problem and solution to issue when using EasyBuild with Xvfb eb files #1919

Closed geoffreyweal closed 8 months ago

geoffreyweal commented 10 months ago

Hello

I have recently had a problem when I was trying to install Xvfb-21.1.6-GCCcore-12.2.0 using EasyBuild 4.7.2. When I was trying to install this using Xvfb-21.1.6-GCCcore-12.2.0.eb (/opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-21.1.6-GCCcore-12.2.0.eb), I got an error where the checksum for the patch for xvfb-run was not correct:

[rocky@rockygpu Gaussian_test]$ eb Xvfb-21.1.6-GCCcore-12.2.0.eb --robot --parallel=4
== Temporary log file in case of crash /tmp/eb-vqz5vkh0/easybuild-hcskrz89.log
== found valid index for /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs, so using it...
== found valid index for /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs, so using it...
== resolving dependencies ...
== processing EasyBuild easyconfig /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-21.1.6-GCCcore-12.2.0.eb
== building and installing Xvfb/21.1.6-GCCcore-12.2.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== FAILED: Installation ended unsuccessfully (build directory: /home/rocky/.local/easybuild/build/Xvfb/21.1.6/GCCcore-12.2.0): build failed (first 300 chars): Checksum verification for /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run using fd6d13182b77871d4f65fccdaebb8a72387a726426066d3f8e6aa26b010ea0e8 failed. (took 0 secs)

I looked for this checksum in the Xvfb-21.1.6-GCCcore-12.2.0.eb file and found this is associated to a patch file called xvfb-run. I did some investigating and found that if I ran eb --search Xvfb, there was an entry for xvfb-run:

[rocky@rockygpu ~]$ eb --search Xvfb
== found valid index for /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs, so using it...
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/m/matlab-proxy/matlab-proxy-0.5.4_fix_xvfb_startup.patch
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.8-GCCcore-8.2.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.8-GCCcore-8.3.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.9-GCCcore-9.3.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.9-GCCcore-10.2.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.11-GCCcore-10.3.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.13-GCCcore-11.2.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-21.1.3-GCCcore-11.3.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/Xvfb-21.1.6-GCCcore-12.2.0.eb
 * /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run

I ran the checksum for the entry for xvfb-run:

[rocky@rockygpu ~]$ sha256sum  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run
a70efee6934d53d7c228a4a8b2a7966b6eb4b47203d66818f122c66b1ab4e7c0  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run

and changed the checksum currently in Xvfb-21.1.6-GCCcore-12.2.0.eb with this one:

    (name, version, {
        'source_urls': ['https://www.x.org/releases/individual/xserver/'],
        'sources': ['xorg-server-%(version)s.tar.gz'],
        'patches': [('xvfb-run', '.')],
        'checksums': [
            '6f9c73ccc50e2731adac17671c8e33687738c8cd556b49ecb9f410ce7217be11',  # xorg-server-21.1.3.tar.gz
            'a70efee6934d53d7c228a4a8b2a7966b6eb4b47203d66818f122c66b1ab4e7c0',  # xvfb-run  # <-- this is for the checksum when running "sha256sum /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run" . original checksum sha256: 'fd6d13182b77871d4f65fccdaebb8a72387a726426066d3f8e6aa26b010ea0e8'
        ],
        'start_dir': 'xorg-server-%(version)s',
        'configopts': local_xvfb_configopts,
        'buildopts': local_xvfb_buildopts,
        'installopts': local_xvfb_buildopts,
    }),

and then retried running eb Xvfb-21.1.6-GCCcore-12.2.0.eb --robot and this seemed to do the trick.

I assume that the /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run has been updated recently but the checksums for this file in the Xvfb eb files have not been updated.

I also tried installing Xvfb-21.1.3-GCCcore-11.3.0.eb (eb Xvfb-21.1.3-GCCcore-11.3.0.eb --robot --parallel=4), got the same problem, and the same solution fixed this. I assume this will be a problem for all the Xvfb eb files.

I didnt know where to update this code in this github directory, so I thought to added it here as an issue. Hopefully this helps sort the problem out :)

I have attached my updated Xvfb-21.1.6-GCCcore-12.2.0.eb and Xvfb-21.1.3-GCCcore-11.3.0.eb files here.

Xvfb-21.1.3-GCCcore-11.3.0.eb.txt Xvfb-21.1.6-GCCcore-12.2.0.eb.txt

Thanks for all the work. Setting up a HPC and OpenHPC is great!

Kindest regards,

Geoffrey


EasyBuild Details:

Name         : EasyBuild-ohpc
Version      : 4.7.2
Release      : 300.ohpc.3.2
Architecture : x86_64
Size         : 60 M
Source       : EasyBuild-ohpc-4.7.2-300.ohpc.3.2.src.rpm
Repository   : @System
From repo    : OpenHPC
Summary      : Software build and installation framework
URL          : https://easybuilders.github.io/easybuild
License      : GPLv2
Description  : EasyBuild is a software build and installation framework that allows
             : you to manage (scientific) software on High Performance Computing (HPC)
             : systems in an efficient way.
adrianreber commented 10 months ago

Thanks for the report.

@boegel, is this a OpenHPC specific problem or a EasyBuild problem. Any ideas?

geoffreyweal commented 10 months ago

Oh I see. Should I report this problem on the EasyBuild github?

adrianreber commented 10 months ago

@boegel Is one of the EasyBuild developers and maintaining the package in OpenHPC. Let's wait what he thinks about it.

geoffreyweal commented 10 months ago

Excellent thanks for that!

boegel commented 10 months ago

The correct checksum for the xvfb-run script that EasyBuild ships is fd6d13182b77871d4f65fccdaebb8a72387a726426066d3f8e6aa26b010ea0e8, it has not been changed since April 2020 (see https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/x/Xvfb/xvfb-run).

So it seems like OpenHPC is somehow changing xvfb-run, and then indeed the checksum should be changed accordingly in the Xvfb easyconfig files...

boegel commented 10 months ago

To work around this, you can use eb --ignore-checksum (but this is not recommended in general, of course)

adrianreber commented 10 months ago

@boegel Thanks.

@geoffreyweal I just checked and I see the correct checksum:

openhpc-lenovo-jenkins-sms:/opt/ohpc # sha256sum  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run
fd6d13182b77871d4f65fccdaebb8a72387a726426066d3f8e6aa26b010ea0e8  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run
openhpc-lenovo-jenkins-sms:/opt/ohpc # rpm --verify EasyBuild-ohpc-4.7.2-300.ohpc.3.1.x86_64
openhpc-lenovo-jenkins-sms:/opt/ohpc # 

Can you run rpm --verify EasyBuild-ohpc to see if RPM detects a changed checksum?

geoffreyweal commented 10 months ago

Interesting. To give a bit more information to this problem, I have just set up Rocky9.2 and OpenHPC on a test computer so I can do some testing before deploying this on our server. This setup is a week old, and I installed OpenHPC per the instruction manual a week ago, so this is quite a current setup.

Unfortunately I have encounter an unrelated problem when running your command:

rpm: symbol lookup error: /lib64/librpmio.so.9: undefined symbol: lzma_stream_encoder_mt, version XZ_5.2

Because this problem is unrelated to this issue (and unless there is not an obvious solution that @adrainreber or @boegel have for this), I might sort this out and then get back you guys. I dont want this unrelated problem to clog up this issue. I mention this problem just because it may take a bit of time for me to run your rpm verify command, but I'm not MIA if I dont respond for a while, just sorting out this problem out.

adrianreber commented 10 months ago

Sure, take your time. A broken RPM sounds serious. I see the symbols you are missing on my system:

$ nm -gD /usr/lib64/liblzma.so.5 | grep lzma_stream_encoder_mt
0000000000010150 T lzma_stream_encoder_mt@@XZ_5.2
00000000000101d0 T lzma_stream_encoder_mt_memusage@@XZ_5.2
geoffreyweal commented 10 months ago

Thanks for that. The good thing about a test machine is that you can completely break it and reset it with no problems to others and can learn from your mistakes!

I have just reset my testbox but I get the same issue. However, if I run module purge, rpm works again. One of the modules I installed must cause a conflict. I'll figure out that issue another time

Back to the current problem, when I run rpm --verify EasyBuild-ohpc, I get the following:

rocky@rockyserver:~$ module purge
rocky@rockyserver:~$ rpm --verify EasyBuild-ohpc
S.5....T.    /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/.eb-path-index
S.5....T.    /opt/ohpc/pub/libs/easybuild/4.7.2/lib/python3.9/site-packages/easybuild/__pycache__/__init__.cpython-39.opt-1.pyc
S.5....T.    /opt/ohpc/pub/libs/easybuild/4.7.2/lib/python3.9/site-packages/easybuild/__pycache__/__init__.cpython-39.pyc
geoffreyweal commented 10 months ago

Also to reconfirm (after resetting my test server), I get the same checksum for xvfb-run

rocky@rockyserver:~$ sha256sum  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run
a70efee6934d53d7c228a4a8b2a7966b6eb4b47203d66818f122c66b1ab4e7c0  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run

I have also attached the xvfb-run file here (coverted to a txt file because github doesnt like executables being uploaded here):

xvfb-run.txt

I diff'ed this version and the one @boegel posted (https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/x/Xvfb/xvfb-run), and it turns out the difference is minor:

rocky@rockyserver:~$ diff /home/rocky/xvfb-run.txt /home/rocky/Downloads/xvfb-run.txt
1c1
< #!/usr/bin/sh # In my version
---
> #!/bin/sh # In the version on https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/x/Xvfb/xvfb-run 

In my new test server, I remove the /usr in the first line to make it the same as the EasyBuild github, and no problems as you would expect. The checksum for xvfb-run after the change is (as expected):

rocky@rockyserver:~$ sha256sum  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run
fd6d13182b77871d4f65fccdaebb8a72387a726426066d3f8e6aa26b010ea0e8  /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run

I have installed EasyBuild as instructed in (https://github.com/openhpc/ohpc/releases/download/v3.0.GA/Install_guide-Rocky9-Warewulf-SLURM-3.0-x86_64.pdf)

sudo yum -y install EasyBuild-ohpc

and the version of EasyBuild I have is EasyBuild-ohpc-4.7.2-300.ohpc.3.2.x86_64

Thanks!

adrianreber commented 10 months ago

I was now able to verify your problem. This happens during build:

[  108s] mangling shebang in /opt/ohpc/pub/libs/easybuild/4.7.2/easybuild/easyconfigs/x/Xvfb/xvfb-run from /bin/sh to #!/usr/bin/sh

RPM is applying certain rules to all files in the package and one of the rules seems to be to adapt shebangs. I need to think how to handle this. Thanks for your report.

geoffreyweal commented 10 months ago

Sweet. Happy to help where I can.

geoffreyweal commented 10 months ago

Thanks for your help on figuring out what is going on here.

adrianreber commented 10 months ago

That is the script doing it https://gitlab.com/redhat/centos-stream/rpms/redhat-rpm-config/-/blob/c9s/brp-mangle-shebangs and it looks like it should be easy to exclude certain paths from it.

adrianreber commented 10 months ago

A quick test shows that setting __brp_mangle_shebangs_exclude_from to *easyconfigs* should be enough to make it work. Let's keep this open until we fix it in 3.1.

geoffreyweal commented 10 months ago

Sounds good.