scylladb / scylla-cqlsh

A fork of the cqlsh code
Apache License 2.0
16 stars 32 forks source link

remove cassandra from the shiv package #96

Closed fruch closed 5 months ago

fruch commented 5 months ago

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.

it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)

Ref: https://github.com/scylladb/scylla-python3/pull/40 Fixes: #95

avikivity commented 5 months ago

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.

it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)

Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

fruch commented 5 months ago

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh. it's really slow down the ability to update new feature that depends on new driver changes (not too many of those) Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh. it's really slow down the ability to update new feature that depends on new driver changes (not too many of those) Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

I don't like the idea of every change or fix we'll want to the cqlsh, it won't go via new dbuild image...

avikivity commented 5 months ago

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh. it's really slow down the ability to update new feature that depends on new driver changes (not too many of those) Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh. it's really slow down the ability to update new feature that depends on new driver changes (not too many of those) Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

I don't like the idea of every change or fix we'll want to the cqlsh, it won't go via new dbuild image...

The python driver is also used for the tests, so it's a good idea to keep it updated.

avikivity commented 5 months ago

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh. it's really slow down the ability to update new feature that depends on new driver changes (not too many of those) Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh. it's really slow down the ability to update new feature that depends on new driver changes (not too many of those) Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

I don't like the idea of every change or fix we'll want to the cqlsh, it won't go via new dbuild image...

The python driver is also used for the tests, so it's a good idea to keep it updated.

The alternative is to include the python driver as a submodule and build it as part of the build process.

fruch commented 5 months ago

The alternative is to include the python driver as a submodule and build it as part of the build process.

an alternative is todo exactly what scylla-python3 is doing for the downloaded wheel (without submoduling anything), i.e. patchelf the files, the trick would be to make sure the shiv package is exactract to a known location (now it's defaults to ~/.shiv) with is relative to scylla-python3, and add to the rpath

I think it would work exactly as the driver from scylla-python3 works.

it just gonna take time to build it and try it out, and confirm it's working. And it would take time for me to get to doing it.

I'm still trying to completely build this one in: https://github.com/scylladb/scylladb/pull/19558

fruch commented 5 months ago

https://github.com/scylladb/scylladb/pull/19558

Passed complication and gating tests

@syuu1228, can you review this change ?

Annamikhlin commented 5 months ago

scylladb/scylladb#19558

Passed complication and gating tests

@syuu1228, can you review this change ?

@syuu1228 - ping.. could you please review

avikivity commented 5 months ago

The alternative is to include the python driver as a submodule and build it as part of the build process.

an alternative is todo exactly what scylla-python3 is doing for the downloaded wheel (without submoduling anything),

The submodule is a solution to having to regenerate the frozen toolchain on each driver change.

Since the driver changes often, and is part of our source base, it's reasonable to build it as part of the build process.

syuu1228 commented 5 months ago

I found that there are more shared library on shiv, it may cause similar error since these are not relocatable binaries:

$ find .shiv|grep -e "\.so"
.shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/scylla_driver.libs/libev-a3026d2b.so.4.0.0
.shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/yaml/_yaml.cpython-312-x86_64-linux-gnu.so
$ ldd .shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/scylla_driver.libs/libev-a3026d2b.so.4.0.0
        linux-vdso.so.1 =>  (0x00007ffcbd1fe000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f6240e94000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6240ac6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f62413a5000)
$ ldd .shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/yaml/_yaml.cpython-312-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007ffff410b000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4aedc57000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f4aed889000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4aede73000)

I saw previous discussion on the thread we should use scylla-driver on scylla-python3 or install specific version of scylla-driver, here are example code on both versions.

  1. If we want to use scylla-driver on scylla-python3, we also need to drop pyyaml and scylla-driver modules. This way the code will be simple, but we need to update frozen toolchain to update scylla-driver.

    diff --git a/scripts/create-relocatable-package.py b/scripts/create-relocatable-package.py
    index 2345289..d001503 100755
    --- a/scripts/create-relocatable-package.py
    +++ b/scripts/create-relocatable-package.py
    @@ -94,7 +94,7 @@ ar.reloc_add('build/debian/debian', arcname='debian')
    # and pointing the correct lib folder)
    cqlsh_bin = pathlib.Path('bin/cqlsh').resolve()
    subprocess.check_call(["mv", cqlsh_bin, f'{cqlsh_bin}.zip'])
    -subprocess.run(["zip", "--delete", cqlsh_bin, "site-packages/cassandra/*"])
    +subprocess.run(["zip", "--delete", cqlsh_bin, "site-packages/cassandra/*", "site-packages/yaml/*", "site-packages/_yaml/*", "site-packages/scylla_driver-*.dist-info/*", "site-packages/scylla_driver.libs/*"])
    subprocess.check_call(["mv", f'{cqlsh_bin}.zip', cqlsh_bin])
    
    ar.reloc_add('bin/cqlsh')
    \ No newline at end of file
  2. Or, if we want to install specifc version of scylla-driver, we need to implement bit complecated code something like this. There are reasons we cannot simply set rpath on create-relocatable-package.py. Because,

    • we have to use absolute path on rpath since ~/.shiv is out side of /opt/scylladb
    • and even we cannot hard code rpath to /opt/scylladb/python3/lib64, because we have nonroot mode
    • to configure rpath correctly on nonroot mode, we need to run patchelf on install.sh to do that, we need to bring relocatable patchelf on scylla-cqlsh.tar.gz
diff --git a/install.sh b/install.sh
index c9cb65c..1392dfa 100755
--- a/install.sh
+++ b/install.sh
@@ -129,6 +129,12 @@ fi
 install -d -m755 "$rprefix"/share/cassandra/pylib
 cp -rp pylib/cqlshlib "$rprefix"/share/cassandra/pylib

+shared_libs=(site-packages/cassandra/io/*.so site-packages/scylla_driver.libs/*.so.* site-packages/yaml/*.so site-packages/cassandra/*.so)
+unzip -o bin/cqlsh ${shared_libs[@]}
+for lib in ${shared_libs[@]}; do
+    patchelf/bin/patchelf --add-rpath "$prefix"/python3/lib64 "$lib"
+done
+zip bin/cqlsh ${shared_libs[@]}
 for i in bin/{cqlsh,cqlsh.py} ; do
     bn=$(basename $i)
     relocate_python3 "$rprefix"/share/cassandra/bin "$i"
diff --git a/scripts/create-relocatable-package.py b/scripts/create-relocatable-package.py
index 2345289..3627405 100755
--- a/scripts/create-relocatable-package.py
+++ b/scripts/create-relocatable-package.py
@@ -22,9 +22,14 @@
 #

 import argparse
+import io
+import os
 import subprocess
 import tarfile
+import shutil
+from tempfile import mkstemp
 import pathlib
+import time

 def erase_uid(tarinfo):
@@ -39,25 +44,90 @@ def reloc_add(self, name, arcname=None):
     else:
         return self.add(name, arcname="{}/{}".format(RELOC_PREFIX, name), filter=erase_uid)

-tarfile.TarFile.reloc_add = reloc_add
-
+def reloc_addfile(self, tarinfo, fileobj=None):
+    tarinfo.name = "{}/{}".format(RELOC_PREFIX, tarinfo.name)
+    return self.addfile(tarinfo, fileobj)

-def fix_binary(path, libpath):
+tarfile.TarFile.reloc_add = reloc_add
+tarfile.TarFile.reloc_addfile = reloc_addfile
+
+def ldd(executable):
+    '''Given an executable file, return a dictionary with the keys
+    containing its shared library dependencies and the values pointing
+    at the files they resolve to. A fake key ld.so points at the
+    dynamic loader.'''
+    libraries = {}
+    for ldd_line in subprocess.check_output(
+            ['ldd', executable],
+            universal_newlines=True).splitlines():
+        elements = ldd_line.split()
+        if ldd_line.endswith('not found'):
+            raise Exception('ldd {}: could not resolve {}'.format(executable, elements[0]))
+        if elements[1] != '=>':
+            if elements[0].startswith('linux-vdso.so'):
+                # provided by kernel
+                continue
+            libraries['ld.so'] = os.path.realpath(elements[0])
+        elif '//' in elements[0]:
+            # We know that the only DSO with a // in the path is the
+            # dynamic linker used by scylla, which is the same ld.so
+            # above.
+            pass
+        else:
+            libraries[elements[0]] = os.path.realpath(elements[2])
+    return libraries
+
+def fix_binary(ar, path, libpath):
     '''Makes one binary or shared library relocatable. To do that, we need to set RUNPATH to $ORIGIN/../lib64 so we get libraries
     from the relocatable directory and not from the system during runtime. We also want to copy the interpreter used so
     we can launch with it later.
     '''
     # it's a pity patchelf have to patch an actual binary.
+    patched_elf = mkstemp()[1]
+    shutil.copy2(path, patched_elf)

     subprocess.check_call(['patchelf',
-                           '--set-rpath',
+                           '--add-rpath',
                            libpath,
-                           path])
-
-
-def fix_sharedlib(binpath):
-    fix_binary(binpath, '$ORIGIN/lib64')
+                           patched_elf])
+    return patched_elf

+def fix_executable_binary(ar, binpath, prefix=''):
+    '''Makes the executable relocatable. To do that, we need to set RUNPATH to $ORIGIN/../lib64 so we get libraries
+    from the relocatable directory and not from the system during runtime. We also want to copy the interpreter used so
+    we can launch with it later.
+    '''
+    pyname = os.path.basename(binpath)
+    patched_binary = fix_binary(ar, binpath, '$ORIGIN/../lib64/')
+    interpreter = subprocess.check_output(['patchelf',
+                                           '--print-interpreter',
+                                           patched_binary], universal_newlines=True).splitlines()[0]
+    ar.reloc_add(os.path.realpath(interpreter), arcname=os.path.join(prefix, "libexec", "ld.so"))
+    ar.reloc_add(patched_binary, arcname=os.path.join(prefix, "libexec", pyname + ".bin"))
+    os.remove(patched_binary)
+
+def fix_sharedlib(ar, binpath, targetpath, prefix=''):
+    relpath =  os.path.join(os.path.relpath("lib64", targetpath), "lib64")
+    patched_binary = fix_binary(ar, binpath, f'$ORIGIN/{relpath}')
+    ar.reloc_add(patched_binary, arcname=os.path.join(prefix, targetpath))
+    os.remove(patched_binary)
+
+def gen_executable_thunk(ar, exebin, prefix=''):
+    base_thunk='''\
+#!/bin/bash
+x="$(readlink -f "$0")"
+b="$(basename "$x")"
+d="$(dirname "$x")/.."
+ldso="$d/libexec/ld.so"
+realexe="$d/libexec/$b.bin"
+exec -a "$0" "$ldso" "$realexe" "$@"
+'''
+    thunk = base_thunk.encode()
+    ti = tarfile.TarInfo(name=os.path.join(prefix, "bin", exebin))
+    ti.size = len(thunk)
+    ti.mode = 0o755
+    ti.mtime = time.time()
+    ar.reloc_addfile(ti, fileobj=io.BytesIO(thunk))

 ap = argparse.ArgumentParser(description='Create a relocatable scylla package.')
 ap.add_argument('--version', required=True,
@@ -76,6 +146,22 @@ with open('build/.relocatable_package_version', 'w') as f:
     f.write('2\n')
 ar.add('build/.relocatable_package_version', arcname='.relocatable_package_version', filter=erase_uid)

+gen_executable_thunk(ar, 'patchelf', 'patchelf')
+fix_executable_binary(ar, '/usr/bin/patchelf', 'patchelf')
+libs = {}
+libs.update(ldd('/usr/bin/patchelf'))
+for lib, f in libs.items():
+    libfile = f
+    if libfile.startswith("/usr/"):
+        libfile = libfile.replace("/usr/", "/", 1)
+    if libfile.startswith("/lib/"):
+        libfile = libfile.replace("/lib/", "lib64/", 1)
+    elif libfile.startswith("/lib64/"):
+        libfile = libfile.replace("/lib64/", "lib64/", 1)
+    else:
+        raise RuntimeError("unexpected path: don't know what to do with {}".format(f))
+    fix_sharedlib(ar, f, libfile, 'patchelf')
+
 pathlib.Path('build/SCYLLA-RELOCATABLE-FILE').touch()
 ar.reloc_add('build/SCYLLA-RELOCATABLE-FILE', arcname='SCYLLA-RELOCATABLE-FILE')
 ar.reloc_add('build/SCYLLA-RELEASE-FILE', arcname='SCYLLA-RELEASE-FILE')
@@ -87,14 +173,4 @@ ar.reloc_add('bin/cqlsh.py')
 ar.reloc_add('pylib')
 ar.reloc_add('install.sh')
 ar.reloc_add('build/debian/debian', arcname='debian')
-
-
-# clear scylla-driver out of the package
-# we assume that scylla-python3 already have it (and all it's .so are relocatable,
-# and pointing the correct lib folder)
-cqlsh_bin = pathlib.Path('bin/cqlsh').resolve()
-subprocess.check_call(["mv", cqlsh_bin, f'{cqlsh_bin}.zip'])
-subprocess.run(["zip", "--delete", cqlsh_bin, "site-packages/cassandra/*"])
-subprocess.check_call(["mv", f'{cqlsh_bin}.zip', cqlsh_bin])
-
 ar.reloc_add('bin/cqlsh')
\ No newline at end of file
-- 
2.45.2
fruch commented 5 months ago

@syuu1228 we can put the .shiv directory into /opt/scylladb then we can do the rpath relative and it should work

fruch commented 5 months ago

Meanwhile we'll deliver this change to unblock master, as it was proven to be working in: https://github.com/scylladb/scylladb/pull/19558

I'll try out the change of patching the .so in the shiv zip, without depending on the driver in scylla-python3 (something similar to what @syuu1228 suggested here, bit with setting the shiv to expend on a known path, and rpath would be know in build time, and not need patchelf during install.sh)