Summary:
The Google Chrome team has a distributed compilation environment. It’s internal-only, but it’s conceptually similar to distcc.
On the macOS 10.13 betas (both beta 1 and beta 2 so far), when I attempt to use this service to build Chrome, all networking stops about 2/3 of the way into the build.
The immediately apparent symptom is that the build stops progressing. I can stop the build, but by then, the damage is done in that networking remains unusable. I can’t pass any traffic over the network when this happens. If I run “ifconfig”, it hangs partway through dumping the “p2p0” interface. If I click on the AirPort menu extra, nothing happens and a beach ball eventually appears over the menu extra. If I try to open the Network preference pane in System Preferences, nothing populates.
Running “top -ocpu”, I see kernel_task at 100%, indicating that something’s spinning in a tight look in the kernel.
^C and SIGKILL do not recover the hung ifconfigs. It is impossible to shut the system down cleanly either via the Apple menu:Restart or via “sudo shutdown -r now”.
Steps to Reproduce:
I can’t provide solid reproduction steps because I’ve only been able to reproduce the problem using our internal distributed build service. This service functions similarly to distcc and does not run anything at elevated privileges.
Essentially, it’s:
% git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
[also get goma, our distributed compilation tool, and start it]
% PATH="${PATH}:$(pwd)/depot_tools:$(pwd)/goma_mac"
% mkdir chrome
% cd chrome
% fetch chrome
[wait]
% cd chrome
% gn gen out/debug --args="use_goma=true goma_dir=\"$(pwd)/goma_mac\""
% ninja -C out/debug chrome -j250
Expected Results:
The build should complete successfully.
Observed Results:
After building 21,000 or so files out of 29,000 or so, the build stops progressing. You’ll notice that networking is not working. Browsers can’t browse, ping shows no connectivity, etc. The AirPort menu extra and Networking preference pane don’t work. kernel_task is using 100% CPU. Run “ifconfig” and it’ll hang irrecoverably part of the way through dumping the p2p0 interface. I’m including a snippet of that here because the sysdiagnose probably missed it since ifconfig never completed.
litterbox@litterbox zsh% ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
nd6 options=201<PERFORMNUD,DAD>
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
XHC20: flags=0<> mtu 0
XHC0: flags=0<> mtu 0
XHC1: flags=0<> mtu 0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether uu:vv:ww:xx:yy:zz
inet6 blahblahblah%en0 prefixlen 64 secured scopeid 0x7
inet6 blahblahblah prefixlen 64 autoconf secured
inet6 blahblahblah prefixlen 64 autoconf temporary
nd6 options=201<PERFORMNUD,DAD>
media: autoselect
status: active
en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
en3: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
en2: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
en4: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304
ether uu:vv:ww:xx:yy:zz
Normally, ifconfig should have continued by printing p2p0’s status line, and several other interfaces. After a fresh reboot, those look like
status: inactive
awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484
ether uu:vv:ww:xx:yy:zz
inet6 blahblahblah%awdl0 prefixlen 64 scopeid 0xf
nd6 options=201<PERFORMNUD,DAD>
media: autoselect
status: active
ipsec0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096
options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
ipsec1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096
options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000
options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
inet6 blahblahblah%utun0 prefixlen 64 scopeid 0x12
nd6 options=201<PERFORMNUD,DAD>
Version:
10.13db2 17A291j with Xcode 9b2 9M137d.
I experienced this with 10.13db1 17A264c and Xcode 9b1 9M136h too.
I see this problem when I’m building on an APFS or HFS+ filesystem.
The system is a MacBook Pro (15-inch, 2016) (MacBookPro13,3).
Configuration:
We never had any trouble up to and including 10.12.5 16F73.
Description
Area: Networking
Summary: The Google Chrome team has a distributed compilation environment. It’s internal-only, but it’s conceptually similar to distcc.
On the macOS 10.13 betas (both beta 1 and beta 2 so far), when I attempt to use this service to build Chrome, all networking stops about 2/3 of the way into the build.
The immediately apparent symptom is that the build stops progressing. I can stop the build, but by then, the damage is done in that networking remains unusable. I can’t pass any traffic over the network when this happens. If I run “ifconfig”, it hangs partway through dumping the “p2p0” interface. If I click on the AirPort menu extra, nothing happens and a beach ball eventually appears over the menu extra. If I try to open the Network preference pane in System Preferences, nothing populates.
Running “top -ocpu”, I see kernel_task at 100%, indicating that something’s spinning in a tight look in the kernel.
^C and SIGKILL do not recover the hung ifconfigs. It is impossible to shut the system down cleanly either via the Apple menu:Restart or via “sudo shutdown -r now”.
Steps to Reproduce: I can’t provide solid reproduction steps because I’ve only been able to reproduce the problem using our internal distributed build service. This service functions similarly to distcc and does not run anything at elevated privileges.
Essentially, it’s:
% git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git [also get goma, our distributed compilation tool, and start it] % PATH="${PATH}:$(pwd)/depot_tools:$(pwd)/goma_mac" % mkdir chrome % cd chrome % fetch chrome [wait] % cd chrome % gn gen out/debug --args="use_goma=true goma_dir=\"$(pwd)/goma_mac\"" % ninja -C out/debug chrome -j250
Expected Results: The build should complete successfully.
Observed Results: After building 21,000 or so files out of 29,000 or so, the build stops progressing. You’ll notice that networking is not working. Browsers can’t browse, ping shows no connectivity, etc. The AirPort menu extra and Networking preference pane don’t work. kernel_task is using 100% CPU. Run “ifconfig” and it’ll hang irrecoverably part of the way through dumping the p2p0 interface. I’m including a snippet of that here because the sysdiagnose probably missed it since ifconfig never completed.
litterbox@litterbox zsh% ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=201<PERFORMNUD,DAD> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 XHC20: flags=0<> mtu 0 XHC0: flags=0<> mtu 0 XHC1: flags=0<> mtu 0 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether uu:vv:ww:xx:yy:zz inet6 blahblahblah%en0 prefixlen 64 secured scopeid 0x7 inet6 blahblahblah prefixlen 64 autoconf secured inet6 blahblahblah prefixlen 64 autoconf temporary nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether uu:vv:ww:xx:yy:zz media: autoselect
status: inactive
en3: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
en2: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
en4: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
options=60<TSO4,TSO6>
ether uu:vv:ww:xx:yy:zz
media: autoselect
status: inactive
p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304
ether uu:vv:ww:xx:yy:zz
Normally, ifconfig should have continued by printing p2p0’s status line, and several other interfaces. After a fresh reboot, those look like
awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484 ether uu:vv:ww:xx:yy:zz inet6 blahblahblah%awdl0 prefixlen 64 scopeid 0xf nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active ipsec0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096 options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> ipsec1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096 options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000 options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> inet6 blahblahblah%utun0 prefixlen 64 scopeid 0x12 nd6 options=201<PERFORMNUD,DAD>
Version: 10.13db2 17A291j with Xcode 9b2 9M137d.
I experienced this with 10.13db1 17A264c and Xcode 9b1 9M136h too.
I see this problem when I’m building on an APFS or HFS+ filesystem.
The system is a MacBook Pro (15-inch, 2016) (MacBookPro13,3).
Configuration: We never had any trouble up to and including 10.12.5 16F73.
- Product Version: 10.13db2 17A291j Created: 2017-06-22T16:10:40.591490 Originated: 2017-06-22T00:00:00 Open Radar Link: http://www.openradar.me/32925139