Closed ligurio closed 2 years ago
There are known concurrency problems in Jepsen library:
jepsen.control is plagued by what I think are race conditions in Jsch, which I've never had the chance to dig in and fix. source
and I believe these problems will gone after upgrading to the latest versions of Jepsen, see #30.
I found another problem, but, it seems, it's cause is the cause of this one as well. So I'll introduce the terms:
apt-get <...> update
invoked by Terraform fails.What I also want to say: I know really nothing about terraform, packer and cloud-init, so mistakes are possible: I'm just trying to understand what is going on based on what I see. My wording may be incorrect.
I found this 'problem B' during testing of PR #93. It is about Ubuntu repositories as well, but symptoms are different:
openstack_compute_instance_v2.instance[0] (remote-exec): E: Type 'to' is not known on line 50 in source list /etc/apt/sources.list
openstack_compute_instance_v2.instance[0] (remote-exec): E: The list of sources could not be read.
It appears on apt-get <...> update before Jepsen starts.
Full logs and other details are below.
I injected the following code to debug the problem B (the patch is applied on the tarantool repository):
diff --git a/extra/tf/main.tf b/extra/tf/main.tf
index abe8e606d..578968ac2 100644
--- a/extra/tf/main.tf
+++ b/extra/tf/main.tf
@@ -29,6 +29,7 @@ resource "openstack_compute_instance_v2" "instance" {
inline = [
"set -o errexit",
"sudo hostnamectl set-hostname n${count.index + 1}",
+ "cat -n /etc/apt/sources.list",
"sudo apt-get -o Debug::Acquire::http=true -o Debug::pkgAcquire::Worker=1 update"
]
}
And run testing several times. Once during the those runs I meet the problem A and found the following difference in /etc/apt/sources.list
between successul and failed run:
$ diff -u <(sed -e 's/^ \?[0-9]\+\t\?//' success-sources-list.txt) <(sed -e 's/^ \?[0-9]\+\t\?//' failure-sources-list.txt)
--- /dev/fd/63 2021-10-31 02:34:07.585041600 +0300
+++ /dev/fd/62 2021-10-31 02:34:07.585041600 +0300
@@ -1,46 +1,38 @@
-## Note, this file is written by cloud-init on first boot of an instance
-## modifications made here will not survive a re-bundle.
-
-## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
-## or do the same in user-data
-## b.) add sources in /etc/apt/sources.list.d
-## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
-
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic main restricted
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic main restricted
## Major bug fix updates produced after the final release of the
## distribution.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic universe
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic universe
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates universe
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates universe
+deb http://archive.ubuntu.com/ubuntu/ bionic universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic universe
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic multiverse
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates multiverse
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
@@ -49,9 +41,9 @@
# deb http://archive.canonical.com/ubuntu bionic partner
# deb-src http://archive.canonical.com/ubuntu bionic partner
-deb http://security.ubuntu.com/ubuntu bionic-security main restricted
-# deb-src http://security.ubuntu.com/ubuntu bionic-security main restricted
-deb http://security.ubuntu.com/ubuntu bionic-security universe
-# deb-src http://security.ubuntu.com/ubuntu bionic-security universe
-deb http://security.ubuntu.com/ubuntu bionic-security multiverse
-# deb-src http://security.ubuntu.com/ubuntu bionic-security multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-security main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic-security universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security universe
+deb http://archive.ubuntu.com/ubuntu/ bionic-security multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security multiverse
I guess that problem B appears due to some transient state of the /etc/apt/sources.list
, so we'll fix both at once.
The raw output of the can -n /etc/apt/sources.list
command in both cases:
Full logs in both cases:
As we can see from the /etc/apt/sources.list
file content, cloud-init writes the file before apt-get <...> update
(and so before Jepsen will install dependencies) in the successful case. There is the nice description of successful and failed cases here, I'll not repeat it. So the solution is simple: wait until cloud-init actions will be finished.
There are several ways to do so, they're spread across the following threads:
As I see from this comment, there are ways to detect that cloud-init is initialized (it occurs only once) and that it is started. I guess that, since we deploy the instance from scratch each time and don't save any state between runs, any way should be okay for us.
I like this solution: just call cloud-init status --wait
. It looks direct and simple. I don't know, whether we can see lack of the cloud-init
executable, but hopefully it is preinstalled into the image we use (rather than installed by terraform).
I would try the following:
diff --git a/extra/tf/main.tf b/extra/tf/main.tf
index abe8e606d..1230efefc 100644
--- a/extra/tf/main.tf
+++ b/extra/tf/main.tf
@@ -28,6 +28,7 @@ resource "openstack_compute_instance_v2" "instance" {
provisioner "remote-exec" {
inline = [
"set -o errexit",
+ "sudo cloud-init status --wait",
"sudo hostnamectl set-hostname n${count.index + 1}",
"sudo apt-get -o Debug::Acquire::http=true -o Debug::pkgAcquire::Worker=1 update"
]
I'll run it several times in CI and if I'll not see neither problem A, nor problem B anymore, I'll propose it in a pull request. Otherwise I'll write the new results here.
I made 200 runs (100 runs of the 'jepsen-single-instance' workflow and 100 runs of the 'jepsen-single-instance-txm' workflow). The statistics is the following.
5 fails (2.5% of total runs) and nothing with symptoms as above. Looks as success!
See details below.
2 crashes on bank-lua with InterruptedException on the bank-lua test. On the first glance, it looks as a problem in the tarantool-java connector.
2 fails on bank-lua due to #94.
1 hang during dependencies installation (there is no information, what command hangs: curl or apt-get).