ibmcb / cbtool

Cloud Rapid Experimentation and Analysis Toolkit
Apache License 2.0
79 stars 49 forks source link

Can't install workloads requiring Python 2 (e.g. ycsb) on Ubuntu 22.04 #451

Open rayx opened 4 months ago

rayx commented 4 months ago

While CBTOOL's own code have been modified to use Python 3, some workloads still use Python 2. One such example is cassandra_ycsb. Ycsb's latest version in upstream is 0.17.0, which uses Python 3. However, CBTOOL uses ycsb version 0.5.0, which (I believe) uses Python 2. That in turn requires a Python 2 version of cassandra driver, which leads to the following error if the vm doesn't have Python 2 installed.

$ sudo wget -N -v -P /home/ubuntu http://launchpadlibrarian.net/109052632/python-support_1.0.15_all.deb; sudo dpkg -i /home/ubuntu/python-support*.deb; sudo apt --fix-broken -y install; sudo wget -N -v -P /home/ubuntu https://mirrors.ibiblio.org/apache/cassandra/2.1.22/debian/cassandra_2.1.22_all.deb; sudo dpkg -i /home/ubuntu/cassandra*.deb; sudo apt --fix-broken -y install;

Errors:  

 python-support depends on python (>= 2.5); however:
  Package python is not installed.

 cassandra depends on python (>= 2.7); however:
  Package python is not installed.

Since the above commands use dpkg, instead of apt to install packages, dependencies are not automatically resolved.

There are two ways to fix the issue:

1) Add Python2 dependencies to PUBLIC_dependencies.txt

2) Upgrade ycsb to latest version

I don't like approach 1, because cassandra Python 2 binding may requires other Python packages (I haven't tried it). Since we are resolving the dependencies manually, it will be a tedious "trial and error" process. I hope CBTOOL can upgrade ycsb and all other workloads that depends on Python 2 to their latest versions to get rid of Python 2 dependencies. The only issue I can think of is how to support old ubuntu releases which does have Python 2 installed by default. One possible solution: for each workload that depends on Python, it has two versions in CBTOOL. Take cassandra_ycsb as an example, it will be cassandra_ycsb_python2 and cassandra_ycsb_python3. User can choose the right version based on the Ubuntu release the vm runs. With this approach the workloads in CBTOOL can be upgraded over time.

EDIT: if we use the approach, cassandra_ycsb_python2 and cassandra_ycsb_python3 would have different dependencies. From what I can tell, the current code already have support to include OS release number in dependency key. So I suppose that will be feasible.

rayx commented 4 months ago

It turns out installing Python2 doesn't work on ubuntu 22.04 because of package name change. Take cassandra_2.1.22_all.deb installation as an example:

$ dpkg -I cassandra_2.1.22_all.deb | grep Depends
 Depends: openjdk-7-jre-headless | java7-runtime, adduser, python (>= 2.7)

So dpkg command fails to install the deb file even if I have installed Python2 in the vm. It appears the only option is to install latest versions of cassandra and ycsb.