livepeer / test-harness

3 stars 2 forks source link

Not being able deploy test configuration #90

Open darkdarkdragon opened 4 years ago

darkdarkdragon commented 4 years ago

Hitting errors like this:

stderr: Error creating machine: Error running provisioning: Error running "DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl": ssh command error:
command : DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl
err     : exit status 100
output  : Reading package lists...
Building dependency tree...
Reading state information...
curl is already the newest version (7.58.0-2ubuntu3.7).
The following package was automatically installed and is no longer required:
  grub-pc-bin
Use 'sudo apt autoremove' to remove it.
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-backports_universe_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-backports_universe_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-backports_main_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-backports_main_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_multiverse_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_multiverse_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_universe_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_universe_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_restricted_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_restricted_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_multiverse_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_multiverse_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_universe_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_universe_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_restricted_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_restricted_binary-amd64_Packages - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_i18n_Translation-en - open (2: No such file or directory)
E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-amd64_Packages - open (2: No such file or directory)

[createMachine] child process exited with code 1
Not waiting 3000ms because of error=[createMachine err] child process exited with code 1

Tried different VM images: https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-minimal-1604-lts https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-minimal-1804-lts https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-minimal-1904

darkdarkdragon commented 4 years ago

Another errors log:


stderr: Error creating machine: Error running provisioning: Error running "sudo apt-get update": ssh command error:
command : sudo apt-get update
err     : exit status 100
output  : Get:1 http://archive.canonical.com/ubuntu disco InRelease [10.9 kB]
Hit:2 http://archive.ubuntu.com/ubuntu disco InRelease
Get:3 http://security.ubuntu.com/ubuntu disco-security InRelease [97.5 kB]
Get:4 http://archive.canonical.com/ubuntu disco/partner amd64 Packages [1616 B]
Get:5 http://archive.canonical.com/ubuntu disco/partner Translation-en [712 B]
Get:6 http://archive.ubuntu.com/ubuntu disco-updates InRelease [97.5 kB]
Get:7 http://archive.ubuntu.com/ubuntu disco-backports InRelease [88.8 kB]
Get:8 http://security.ubuntu.com/ubuntu disco-security/main amd64 Packages [187 kB]
Get:9 http://archive.ubuntu.com/ubuntu disco/universe amd64 Packages [9065 kB]
Get:10 http://security.ubuntu.com/ubuntu disco-security/main Translation-en [68.0 kB]
Get:11 http://security.ubuntu.com/ubuntu disco-security/universe amd64 Packages [244 kB]
Get:12 http://security.ubuntu.com/ubuntu disco-security/universe Translation-en [68.5 kB]
Get:13 http://security.ubuntu.com/ubuntu disco-security/universe amd64 c-n-f Metadata [1260 B]
Get:14 http://security.ubuntu.com/ubuntu disco-security/multiverse amd64 Packages [1172 B]
Get:15 http://security.ubuntu.com/ubuntu disco-security/multiverse Translation-en [632 B]
Get:16 http://security.ubuntu.com/ubuntu disco-security/multiverse amd64 c-n-f Metadata [116 B]
Get:17 http://archive.ubuntu.com/ubuntu disco/universe Translation-en [5251 kB]
Get:18 http://archive.ubuntu.com/ubuntu disco/universe amd64 c-n-f Metadata [277 kB]
Get:19 http://archive.ubuntu.com/ubuntu disco/multiverse amd64 Packages [157 kB]
Get:20 http://archive.ubuntu.com/ubuntu disco/multiverse Translation-en [112 kB]
Get:21 http://archive.ubuntu.com/ubuntu disco/multiverse amd64 c-n-f Metadata [9348 B]
Get:22 http://archive.ubuntu.com/ubuntu disco-updates/main amd64 Packages [248 kB]
Get:23 http://archive.ubuntu.com/ubuntu disco-updates/main Translation-en [95.2 kB]
Get:24 http://archive.ubuntu.com/ubuntu disco-updates/universe amd64 Packages [292 kB]
Get:25 http://archive.ubuntu.com/ubuntu disco-updates/universe Translation-en [96.3 kB]
Get:26 http://archive.ubuntu.com/ubuntu disco-updates/universe amd64 c-n-f Metadata [1780 B]
Get:27 http://archive.ubuntu.com/ubuntu disco-updates/multiverse amd64 Packages [1172 B]
Get:28 http://archive.ubuntu.com/ubuntu disco-updates/multiverse Translation-en [632 B]
Get:29 http://archive.ubuntu.com/ubuntu disco-updates/multiverse amd64 c-n-f Metadata [116 B]
Get:30 http://archive.ubuntu.com/ubuntu disco-backports/main amd64 Packages [1220 B]
Get:31 http://archive.ubuntu.com/ubuntu disco-backports/main Translation-en [684 B]
Get:32 http://archive.ubuntu.com/ubuntu disco-backports/main amd64 c-n-f Metadata [528 B]
Get:33 http://archive.ubuntu.com/ubuntu disco-backports/restricted amd64 c-n-f Metadata [116 B]
Get:34 http://archive.ubuntu.com/ubuntu disco-backports/universe amd64 Packages [3420 B]
Get:35 http://archive.ubuntu.com/ubuntu disco-backports/universe Translation-en [1532 B]
Get:36 http://archive.ubuntu.com/ubuntu disco-backports/universe amd64 c-n-f Metadata [188 B]
Get:37 http://archive.ubuntu.com/ubuntu disco-backports/multiverse amd64 c-n-f Metadata [116 B]
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 93, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 126, in _fill_commands
    with open(f) as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_disco-updates_restricted_cnf_Commands-amd64'
Fetched 16.5 MB in 4s (3876 kB/s)
Reading package lists...
E: Problem executing scripts APT::Update::Post-Invoke-Success 'if /usr/bin/test -w /var/lib/command-not-found/ -a -e /usr/lib/cnf-update-db; then /usr/lib/cnf-update-db > /dev/null; fi'
E: Sub-process returned an error code

[createMachine] child process exited with code 1
Not waiting 3000ms because of error=[createMachine err] child process exited with code 1
darkdarkdragon commented 4 years ago

I'm able to create deployment with 4 instances, but for deployment 50+ instances it gives such errors.

ya7ya commented 4 years ago

@darkdarkdragon which GCP zone are you trying to deploy 50+ instances in ?

darkdarkdragon commented 4 years ago

@ya7ya Right now it is only one that can handle this - us-central1. There is also europe-notrh1 that has enough CPUs, but it doesn't have enough IP addresses. I've tried to overcome this using --google-use-internal-ip-only flag, but this didn't work - it wasn't able to provision machines at all with this flag.

darkdarkdragon commented 4 years ago

OK, so far I've identified two problems:

  1. Sometimes docker-machine create can't provision machine and returns error.
  2. Node process just exits after calling bunch of docker-machine processes finishes. It exits with 0 status code and without any visible error that should lead exit. I've traced it to async's lib - last thing our code do is calling callback that leads inside async lib.

For first problem I'll rewrite our code so if docker-machine create fails, then it removes this failed machine and tries again. For second one - will rewrite code using plains promises, let's see if it helps.