icecc / icecream

Distributed compiler with a central scheduler to share build load
GNU General Public License v2.0
1.6k stars 252 forks source link

iceccd is busy installing, but environment never populates #573

Closed pprkut closed 3 years ago

pprkut commented 3 years ago

I have a system considering of 3 nodes that are connected to the same network:

My usual setup is that I trigger a compile inside the vm and have it distribute it to the compile node. Up until last week that was working fine, but when I tried today, the compile node no longer receives the environment.

I recompiled icecream on the compile node with #define DEBUG_SCHEDULER 3, but I'm not getting any wiser.

What i see is this:

iceccd

[1433] 2021-01-30 14:26:24: ignoring localhost lo for broadcast
[1433] 2021-01-30 14:26:24: broadcast eth0 192.168.1.255
[1433] 2021-01-30 14:26:24: scheduler not yet found/selected.
[1433] 2021-01-30 14:26:24: Suitable scheduler found at 192.168.1.101:8765 (version: 42)
[1433] 2021-01-30 14:26:24: scheduler not yet found/selected.
[1433] 2021-01-30 14:26:27: scheduler is on 192.168.1.101:8765 (net liwjatan.org)
[1433] 2021-01-30 14:26:29: Connected to scheduler (I am known as 192.168.1.101)

icecc-scheduler

[1591] 2021-01-30 14:25:51: ICECREAM scheduler 1.3.1 starting up, port 8765
[1592] 2021-01-30 14:25:51: scheduler ready
[1592] 2021-01-30 14:25:51: ignoring localhost lo for broadcast
[1592] 2021-01-30 14:25:51: broadcast eth0 192.168.1.255
[1592] 2021-01-30 14:25:51: ignoring localhost lo for broadcast
[1592] 2021-01-30 14:25:51: broadcast eth0 192.168.1.255
[1592] 2021-01-30 14:25:51: Received scheduler announcement from 192.168.1.101:54797 (version 42, netname liwjatan.org)
[1592] 2021-01-30 14:26:23: broadcast from 192.168.1.127:51495 (version 42)
[1592] 2021-01-30 14:26:24: broadcast from 192.168.1.101:41965 (version 42)
[1592] 2021-01-30 14:26:26: accepted 192.168.1.127
[1592] 2021-01-30 14:26:26: login carbon.liwjatan.org protocol version: 42 features: env_xz env_zstd []
[1592] 2021-01-30 14:26:29: accepted 192.168.1.101
[1592] 2021-01-30 14:26:29: login anubis.liwjatan.org protocol version: 42 features: env_xz env_zstd []
[1592] 2021-01-30 14:27:22: handle_local_job  1
[1592] 2021-01-30 14:27:22: handle_local_job_done 1
[1592] 2021-01-30 14:27:22: handle_local_job  2
[1592] 2021-01-30 14:27:22: handle_local_job_done 2
[1592] 2021-01-30 14:27:23: NEW 3 client=carbon.liwjatan.org versions=[slackware-14.2+-x86_64-native(x86_64)] sbo_tmp/qownnotes-21.1.7/main.cpp C++
[1592] 2021-01-30 14:27:23: pick_server 3 x86_64
[1592] 2021-01-30 14:27:23: carbon.liwjatan.org is_eligible_ever: 1 (jobs_okay 1, version_okay 1, features_okay 1, chroot_or_local 1, accepting 1, can_install 1, check_remote 1)
[1592] 2021-01-30 14:27:23: carbon.liwjatan.org is_eligible_now: 1 (jobs_okay 1, load_okay 1)
[1592] 2021-01-30 14:27:23: anubis.liwjatan.org is_eligible_ever: 1 (jobs_okay 1, version_okay 1, features_okay 1, chroot_or_local 1, accepting 1, can_install 1, check_remote 1)
[1592] 2021-01-30 14:27:23: anubis.liwjatan.org is_eligible_now: 1 (jobs_okay 1, load_okay 1)
[1592] 2021-01-30 14:27:23: no job stats - returning randomly selected anubis.liwjatan.org load: 21 can install: x86_64
[1592] 2021-01-30 14:27:23: put 3 in joblist of anubis.liwjatan.org (will install now)
[1592] 2021-01-30 14:27:23: END 3 status=118
[1592] 2021-01-30 14:27:23: handle_local_job /usr/src/sbo_tmp/qownnotes-21.1.7/main.o 4
[1592] 2021-01-30 14:27:23: NEW 5 client=carbon.liwjatan.org versions=[slackware-14.2+-x86_64-native(x86_64)] qownnotes-21.1.7/dialogs/attachmentdialog.cpp C++
[1592] 2021-01-30 14:27:23: NEW 6 client=carbon.liwjatan.org versions=[slackware-14.2+-x86_64-native(x86_64)] qownnotes-21.1.7/entities/cloudconnection.cpp C++
[1592] 2021-01-30 14:27:23: NEW 7 client=carbon.liwjatan.org versions=[slackware-14.2+-x86_64-native(x86_64)] qownnotes-21.1.7/helpers/codetohtmlconverter.cpp C++
[1592] 2021-01-30 14:27:23: pick_server 5 x86_64
[1592] 2021-01-30 14:27:23: carbon.liwjatan.org is_eligible_ever: 1 (jobs_okay 1, version_okay 1, features_okay 1, chroot_or_local 1, accepting 1, can_install 1, check_remote 1)
[1592] 2021-01-30 14:27:23: carbon.liwjatan.org is_eligible_now: 1 (jobs_okay 1, load_okay 1)
[1592] 2021-01-30 14:27:23: anubis.liwjatan.org is_eligible_ever: 1 (jobs_okay 1, version_okay 1, features_okay 1, chroot_or_local 1, accepting 1, can_install [1592] 2021-01-30 14:27:23: anubis.liwjatan.org is busy installing since 0 seconds.
0, check_remote 1)
[1592] 2021-01-30 14:27:23: anubis.liwjatan.org is busy installing since 0 seconds.
[1592] 2021-01-30 14:27:23: anubis.liwjatan.org is_eligible_now: 0 (jobs_okay 1, load_okay 1)
[1592] 2021-01-30 14:27:23: no job stats - returning randomly selected carbon.liwjatan.org load: 506 can install: x86_64
[1592] 2021-01-30 14:27:23: put 5 in joblist of carbon.liwjatan.org

But even though it says anubis.liwjatan.org is busy installing since 0 seconds, nothing's actually happening and eventually I just see:

[1592] 2021-01-30 14:29:21: NEW 53 client=carbon.liwjatan.org versions=[slackware-14.2+-x86_64-native(x86_64)] qownnotes-21.1.7/utils/git.cpp C++
[1592] 2021-01-30 14:29:21: pick_server 42 x86_64
[1592] 2021-01-30 14:29:21: overloaded carbon.liwjatan.org 3/2 jobs, load:136
[1592] 2021-01-30 14:29:21: anubis.liwjatan.org is_eligible_ever: 1 (jobs_okay 1, version_okay 1, features_okay 1, chroot_or_local 1, accepting 1, can_install [1592] 2021-01-30 14:29:21: anubis.liwjatan.org is busy installing since 118 seconds.
0, check_remote 1)
[1592] 2021-01-30 14:29:21: anubis.liwjatan.org is busy installing since 118 seconds.
[1592] 2021-01-30 14:29:21: anubis.liwjatan.org is_eligible_now: 0 (jobs_okay 1, load_okay 1)
[1592] 2021-01-30 14:29:21: anubis.liwjatan.org not eligible
[1592] 2021-01-30 14:29:21: carbon.liwjatan.org is_eligible_ever: 1 (jobs_okay 1, version_okay 1, features_okay 1, chroot_or_local 1, accepting 1, can_install 1, check_remote 1)
[1592] 2021-01-30 14:29:21: No suitable host found, delaying
[1592] 2021-01-30 14:29:23: END 39 status=0 in=0(0%) out=31048(0%) real=4797 user=2265 sys=127 pfaults=55904 server=carbon.liwjatan.org
[1592] 2021-01-30 14:29:23: add_job_stats C++ 212 2265 00010 94532 31048 carbon.liwjatan.org 13.7077 49.9102
[1592] 2021-01-30 14:29:23: busy installing for a long time - removing anubis.liwjatan.org
[1592] 2021-01-30 14:29:23: Handle_end 0x1a8ded0 0
[1592] 2021-01-30 14:29:23: remove daemon anubis.liwjatan.org

I also tried deploying the environment manually, but that doesn't change anything either. iceccd is just sitting there doing nothing.

Is there anything else I can enable to see why iceccd never picks up the environment?

pprkut commented 3 years ago

Found the issue, entirely PEBKAC.

I changed my build scripts to use unshare -n, to block downloads during builds. Didn't think of that also blocking network access for icecream...