vscentrum / vsc-software-stack

Central repository of easyconfigs used in the software installations on VSC clusters.
2 stars 6 forks source link

NextPolish #398

Closed laraPPr closed 1 month ago

laraPPr commented 3 months ago
pavelToman commented 2 months ago

PR:

laraPPr commented 2 months ago

installing now

laraPPr commented 2 months ago

Failed on shinx will check it out

laraPPr commented 2 months ago
Sanity check failed: No '(RPATH)' found in 'readelf -d' output for /apps/gent/RHEL9/zen4-ib/software/NextPolish/1.4.1-GCCcore-12.3.0/lib/calgs.so
boegel commented 2 months ago

The issue is that NextPolish comes with a bunch of pre-compiled libraries (which don't use RPATH linking, which makes sense because that would make them difficult to use across different systems):

$ ls -lrt lib/*.so
-rwxr-xr-x@ 1 kehoste  staff  579992 Jul  7  2022 lib/nextpolish1.so
-rwxr-xr-x@ 1 kehoste  staff   10280 Jul  7  2022 lib/calgs.so
-rwxr-xr-x@ 1 kehoste  staff  678328 Jul  7  2022 lib/nextpolish2.so
$ ldd lib/*.so
lib/calgs.so:
        linux-vdso.so.1 (0x00007ffe9b99c000)
        libz.so.1 => /lib64/libz.so.1 (0x00001459853fc000)
        libc.so.6 => /lib64/libc.so.6 (0x0000145985037000)
        /lib64/ld-linux-x86-64.so.2 (0x0000145985817000)
lib/nextpolish1.so:
        linux-vdso.so.1 (0x00007ffda4b8e000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00001510409c6000)
        libm.so.6 => /lib64/libm.so.6 (0x0000151040644000)
        libbz2.so.1 => /lib64/libbz2.so.1 (0x0000151040433000)
        liblzma.so.5 => /lib64/liblzma.so.5 (0x000015104020c000)
        libz.so.1 => /lib64/libz.so.1 (0x000015103fff4000)
        libc.so.6 => /lib64/libc.so.6 (0x000015103fc2f000)
        /lib64/ld-linux-x86-64.so.2 (0x0000151040e74000)
lib/nextpolish2.so:
        linux-vdso.so.1 (0x00007ffe3d119000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000014621a7a3000)
        libm.so.6 => /lib64/libm.so.6 (0x000014621a421000)
        libbz2.so.1 => /lib64/libbz2.so.1 (0x000014621a210000)
        liblzma.so.5 => /lib64/liblzma.so.5 (0x0000146219fe9000)
        libz.so.1 => /lib64/libz.so.1 (0x0000146219dd1000)
        libc.so.6 => /lib64/libc.so.6 (0x0000146219a0c000)
        /lib64/ld-linux-x86-64.so.2 (0x000014621ac69000)

The best way to fix this is probably via patchelf, by using that to inject an RPATH section that ensures that the correct libz.so is picked up for example. This would need to be done conditionally, only when EasyBuild is configured to use RPATH, so that probably means using an easyblock (unless we enhance framework so an easyconfig file can know whether or not RPATH is enabled). See also (for example) the use of patchelf in MaSuRCA-4.1.0-GCC-11.3.0.eb.

pavelToman commented 2 months ago

So the solution is to add to postinstallcmds this: "patchelf --force-rpath --set-rpath '$ORIGIN' %(installdir)s/lib/calgs.so" ? Or the $ORIGIN should be changed?

Slack conversation about calgs.so : https://easybuild.slack.com/archives/GP6UN4MKP/p1724764742743239

boegel commented 2 months ago

$ORIGIN will need to be changed to also include $EBROOTZLIB/lib, likewise for other dependencies. And usually both $ORIGIN and $ORIGIN/../lib are included (though that doesn't seem relevant here since those libraries aren't linking to each other), so something like:

patchelf --force-rpath --set-rpath "\$ORIGIN:\$ORIGIN/..lib:$EBROOTZLIB/lib:..." %(installdir)s/lib/calgs.so

Escaping the $ in $ORIGIN is important, since $ORIGIN should not be resolved, while $EBROOTZLIB should be.

The patchelf should be done on the lib/*.so files, mainly.

That said, I fail to see where any of these libraries are actually used... What's their purpose exactly?

All the provided binaries are actually of dependencies (which ideally don't get built from source when building NextPolish, since we have modules for them already).

@laraPPr Maybe we should remove the modules we've installed already, and take a closer look at this first before informing the people who requested this installation?

laraPPr commented 2 months ago

I'm ok with removing them. Have not informed them yet.

laraPPr commented 2 months ago

Removed the modules

laraPPr commented 2 months ago

This is the extra info that was given. It is still not clear to me what they need NextPolish for so maybe I should even ask for more clarification?

I want to use NextPolish to fix base errors in my assembled genome. So after assembly,
we would run NextPolish to 'polish' the genome.
NextPolish uses less time and has a higher correction accuracy than alternative
programs, especially for large genomes. We are working with Fusarium genomes, which
are known to be quite big.

Underneath is an example script, we will do something similar.
reads="/location of reads/example.fastq.gz"
genome="/path to long read polish/racon-2.fasta"
rounds=2
cpus=XX
memory=XXg

for ((i=1; i<=${rounds};i++))
do
#step 1
#index the genome and do alignment
bwa index $genome
bwa mem -t $cpus -p $genome $reads \
| samtools view -@ $cpus -F 0x4 -b - \
| samtools fixmate -m -@ $cpus - - \
| samtools sort -m $memory -@ $cpus - \
| samtools markdup -@ $cpus -r - sorted.bam
#index bam and genome
samtools index -@ $cpus sorted.bam
samtools faidx $genome
#polish genome
    python nextpolish1.py -g $genome -t 1 -p $cpus -s sorted.bam >tmp~${i}.fasta
#step 2
    #index the genome and do alignment
    bwa index tmp~${i}.fasta
    bwa mem -t $cpus -p tmp~${i}.fasta ${readFile} \
        | samtools view -@ $cpus -F 0x4 -b - \
        | samtools fixmate -m -@ $cpus  - - \
        | samtools sort -m $memory -@ $cpus - \
        | samtools markdup -@ $cpus -r - sorted.bam
    #index bam and genome
    samtools index -@ $cpus sorted.bam
    samtools faidx tmp~${i}.fasta
    #polish genome
    python nextpolish1.py -g tmp~${i}.fasta -t 2 -p $cpus -s sorted.bam > nextPolish~${i}.fasta

     genome=nextPolish~${i}.fasta
done
boegel commented 2 months ago

Since that only shows bwa and samtools commands, can you ask them how NextPolish is different from just using BWA and SAMtools?

boegel commented 2 months ago

I'll ask the user for more input, like where nextpolish1.py comes from, and what's in it...

boegel commented 2 months ago

nextpolish1.py is part of the installation, see $EBROOTNEXTPOLISH/lib/nextpolish1.py, so we can try ourselves whether using with BWA & SAMtools as dependencies and not letting NextPolish installs its own copies works, using the tutorial available at https://nextpolish.readthedocs.io/en/latest/TUTORIAL.html

So I don't think we need more output from the user on this...

pavelToman commented 2 months ago

minimap2, samtools and bwa is no more from NextPolish but from EB: https://github.com/vscentrum/vsc-software-stack/blob/wip/398_NextPolish/nextpolish.eb

For a problem with RPATH there is another WIP: https://github.com/vscentrum/vsc-software-stack/blob/wip/398_NextPolish/nextpolish_v2.eb It can be build but still the RPATH is not ok. You can test it by: $ nextPolish test_data/run.cfg For now it returns: OSError: /kyukon/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/NextPolish/1.4.1-GCC-12.3.0/lib/calgs.so: ELF load command address/offset not properly aligned

boegel commented 1 month ago

@pavelToman There's a typo in your patchelf command, you should use $ORIGIN/../lib instead of $ORIGIN/..lib (missing /). And you can drop the $EBROOTGCC entry

pavelToman commented 1 month ago

I still missing something: My command to fix RPATH: patchelf --force-rpath --set-rpath "\$ORIGIN:\$ORIGIN/../lib:$EBROOTZLIB/lib" %(installdir)s/lib/calgs.so After, the patchelf --print-rpath %(installdir)s/lib/calgs.so cmd returns: $ORIGIN:$ORIGIN/../lib:/apps/gent/RHEL8/cascadelake-ib/software/zlib/1.2.13-GCCcore-12.3.0/lib, but as I run nextPolish test_data/run.cfg it still returns: ELF load command address/offset not properly aligned

boegel commented 1 month ago

@pavelToman What do you get when you run ldd on the library after using patchelf --set-rpath?

https://github.com/NixOS/patchelf/issues/492 suggests you may be hitting a bug in patchelf v0.18.0. There's an unmerged fix for it in https://github.com/NixOS/patchelf/pull/566, maybe you can try patching and rebuilding patchelf 0.18.0 with that fix, and see if that fixes the problem?

pavelToman commented 1 month ago

ldd returns not a dynamic executable after patchelf --set-rpath, but file calgs.so returns calgs.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=e746a58f13d563314f1d0504f97bc12057875684, stripped and readelf -h calgs.so returns DYN (Shared object file)

boegel commented 1 month ago

@pavelToman I think what ldd is reporting is another sign of the library being broken by patchelf

pavelToman commented 1 month ago

Ok, there is a another news - it start working without patchelf. I delete all the NextPolish modules to start fresh and let it be build without patchelf cmd. Now when I run ldd calgs.so it returns:

linux-vdso.so.1 (0x00001486bc29f000)
libz.so.1 => /apps/gent/RHEL8/cascadelake-ib/software/zlib/1.2.13-GCCcore-12.3.0/lib/libz.so.1 (0x00001486bc27e000)
libc.so.6 => /lib64/libc.so.6 (0x00001486bbaab000)
/lib64/ld-linux-x86-64.so.2 (0x00001486bc073000)

so it seems the libz is ok. But patchelf --print-rpath returns nothing and readelf -d calgs.so returns:

  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x898
 0x000000000000000d (FINI)               0x1538
 0x0000000000000019 (INIT_ARRAY)         0x201de0
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x201de8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x1f0
 0x0000000000000005 (STRTAB)             0x4b8
 0x0000000000000006 (SYMTAB)             0x230
 0x000000000000000a (STRSZ)              295 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x202000
 0x0000000000000002 (PLTRELSZ)           360 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x730
 0x0000000000000007 (RELA)               0x658
 0x0000000000000008 (RELASZ)             216 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x618
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x5e0
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

Either nextPolish test_data/run.cfg runs ok

pavelToman commented 1 month ago

So I patched the patchelf v0.18 and it looks good: I used patchelf --force-rpath --set-rpath '\$ORIGIN:\$ORIGIN/../lib:$EBROOTZLIB/lib' %(installdir)s/lib/calgs.so patchelf --print-rpath calgs.so returns: \$ORIGIN:\$ORIGIN/../lib:$EBROOTZLIB/lib ldd calgs.so returns:

linux-vdso.so.1 (0x00007ffe745e4000)
libz.so.1 => /apps/gent/RHEL8/cascadelake-ib/software/zlib/1.2.13-GCCcore-12.3.0/lib/libz.so.1 (0x0000154f7b07
5000)
libc.so.6 => /lib64/libc.so.6 (0x0000154f7a69e000)
/lib64/ld-linux-x86-64.so.2 (0x0000154f7ae64000)

and readelf --dynamic calgs.so returns:

Dynamic section at offset 0x200440 contains 26 entries:
  Tag        Type                         Name/Value
 0x000000000000000f (RPATH)              Library rpath: [\$ORIGIN:\$ORIGIN/../lib:$EBROOTZLIB/lib]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x898
 0x000000000000000d (FINI)               0x1538
 0x0000000000000019 (INIT_ARRAY)         0x201de0
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x201de8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400028
 0x0000000000000005 (STRTAB)             0x4002f0
 0x0000000000000006 (SYMTAB)             0x400068
 0x000000000000000a (STRSZ)              336 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x202000
 0x0000000000000002 (PLTRELSZ)           360 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x730
 0x0000000000000007 (RELA)               0x658
 0x0000000000000008 (RELASZ)             216 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x618
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x5e0
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

Is it ok like that? Should it return \$ORIGIN not $ORIGIN?

pavelToman commented 1 month ago

new PR: https://github.com/easybuilders/easybuild-easyconfigs/pull/21588

boegel commented 1 month ago

@pavelToman Make sure you test with eb --rpath to enable RPATH linking (which will be the default in EasyBuild 5.0, and we're already using that on our RHEL9 cluster)

boegel commented 1 month ago

@pavelToman ready for cleanup