Closed jizhuoyu closed 8 months ago
As discussed, we observe e2e test error due to not enough disk space:
verticadb-operator verticadb-operator-manager-6c4bfdcfb4-br6hp manager 2024-02-22T14:03:01.583Z INFO controllers.VerticaDB.InstallPackages.HTTPSInstallPackagesOp JSON response {"verticadb": "kuttl-test-legal-maggot/v-upgrade-vertica", "reconcile-uuid": "d849aed9-549e-4068-b526-9c232a14abc3", "host": "10.244.0.36", "responseObj": {"packages":[{"package_name":"ComplexTypes","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704965_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"DelimitedExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704966_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"JsonExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704967_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"MachineLearning","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704968_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"OrcExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704969_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"ParquetExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704970_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"VFunctions","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704971_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"approximate","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704972_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"flextable","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704973_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"kafka","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704974_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"logsearch","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704975_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"place","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704976_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"txtindex","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704977_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"voltagesecure","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704978_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."}]}}
verticadb-operator verticadb-operator-manager-6c4bfdcfb4-br6hp manager 2024-02-22T14:04:33.994Z INFO controllers.VerticaDB ExecInPod stream {"verticadb": "kuttl-test-aware-wallaby/v-upgrade-vertica", "reconcile-uuid": "3e304ea9-c2a7-4a02-aceb-2ad82318751c", "pod": "kuttl-test-aware-wallaby/v-upgrade-vertica-pri-0", "err": "command terminated with exit code 1", "stdout": "Checking whether package approximate is already installed...\nInstalling package approximate...\n...Success!\nChecking whether package logsearch is already installed...\nInstalling package logsearch...\n...Success!\nChecking whether package DelimitedExport is already installed...\nInstalling package DelimitedExport...\nFailed to install package DelimitedExport\nChecking whether package VFunctions is already installed...\nInstalling package VFunctions...\nFailed to install package VFunctions\nChecking whether package flextable is already installed...\nInstalling package flextable...\n...Success!\nChecking whether package kafka is already installed...\nInstalling package kafka...\n...Success!\nChecking whether package voltagesecure is already installed...\nInstalling package voltagesecure...\n...Success!\nChecking whether package ParquetExport is already installed...\nInstalling package ParquetExport...\n...Success!\nChecking whether package OrcExport is already installed...\nInstalling package OrcExport...\n...Success!\nChecking whether package MachineLearning is already installed...\nInstalling package MachineLearning...\n...Success!\nChecking whether package ComplexTypes is already installed...\nInstalling package ComplexTypes...\n...Success!\nChecking whether package place is already installed...\nInstalling package place...\nFailed to install package place\nChecking whether package txtindex is already installed...\nInstalling package txtindex...\nFailed to install package txtindex\nChecking whether package JsonExport is already installed...\nInstalling package JsonExport...\nFailed to install package JsonExport\n", "stderr": ""}
To try addressing this issue, we are removing the test steps to download 23.4 image and upgrade to 23.4 from v12 image. Notice that this implies that we will not test the AT implementation of install packages in e2e test.
I saw that the new e2e test still fails because there isn't enough disk space. So, I started to look at the disk space usage of a database.
Here is my suggestion for an attempt at fixing it:
CreateSkipPackageInstall
and remove the package verification steps. This will keep the disk requirements low.I saw that the new e2e test still fails because there isn't enough disk space. So, I started to look at the disk space usage of a database.
- New databases created without package install: communal 8K, each node is 11MB
- New databases created with packages installed: communal: 162MB, each node is 335MB.
- When upgrading 2., the disk space usage after the upgrade is: communal 323MB, each node is 394MB. Although I did see the per node usage spike to 576MB before it went down.
Here is my suggestion for an attempt at fixing it:
- change the e2e tests to use the initPolicy of
CreateSkipPackageInstall
and remove the package verification steps. This will keep the disk requirements low.- keep running the tests serially
- add a 2 new test to the same leg that will verify package install through upgrade. One for online and one for offline upgrade. The only difference is that this test will only be for a single node. We probably only need to do 1 upgrade here to verify things are working.
I tested in one commit where the install packages steps are still kept in the 2 original upgrade tests. The results of all 4 tests are as follows:
now that with the latest commits we essentially have 2 tests only for upgrade (3 times) and 2 tests for upgrade and install (once from 23.4->24.1). we passed the former 2 tests and failed the latter 2.
I guess maybe we could pass if we upgrade and install from 12.0.4->23.4 rather than 23.4->24.1 for the latter 2 tests, however this means that we are testing install for admintools only.
Thanks for trying these experiments. It doesn't look like we'll be able to automate your tests on account of the disk space constraint. Manual verification will have to do for now. Can you remove those two new tests you added? We can add back the parallelism to leg 8 as well. Can we get the other tests in e2e leg 8 back to what they were before. I think you removed one of the upgrade versions.
Thanks for trying these experiments. It doesn't look like we'll be able to automate your tests on account of the disk space constraint. Manual verification will have to do for now. Can you remove those two new tests you added? We can add back the parallelism to leg 8 as well. Can we get the other tests in e2e leg 8 back to what they were before. I think you removed one of the upgrade versions.
As discussed, leg 8 now has 2 tests remaining (upgrade 3 times from 12.0.4 to 23.4 to 24.1 to latest) for both online and offline upgrade where we have CreateSkipPackageInstall
as the initPolicy
. I added several steps waiting for condition=UpgradeInProgress=False
as I think it's good to confirm that we actually pass the last step of an upgrade. Besides, comments are added in setup-vdb.yaml
to clearly state the reason why we are skipping package install. @spilchen
In Vertica, upgrading the server version requires reinstalling packages because they are tied to a specific server version. This process was handled automatically during admintools deployments, but was not implemented for vclusterOps deployments. To address this issue, we have modified the upgrade process to include a package reinstallation step after restarting Vertica with the new version.