jeaye / nixos-in-place

Install NixOS on top of any existing Linux distribution without rebooting
MIT License
458 stars 57 forks source link

Possible race condition in stage1? #35

Open nh2 opened 7 years ago

nh2 commented 7 years ago

I got on an OVH cloud server running Ubuntu 14.04:

>>> Validating checksum
nixos-minimal-16.09.680.4e14fd5-x86_64-linux.iso: OK
>>> Extracting ISO
mount: /dev/loop0 is write-protected, mounting read-only
Parallel unsquashfs: Using 8 processors
44678 inodes (49014 blocks) to write

[=======================================================================|] 49014/49014 100%

created 37671 files
created 13602 directories
created 7007 symlinks
created 0 devices
created 0 fifos
>>> Embarking stage1!
>>> Setting up chroot networking
>>> Looking for NixOS init... find: './proc/1902': No such file or directory

Running it again made it go past that without problems.

Maybe there's a race?

jeaye commented 7 years ago

Hm, I think you're onto something. The snippet of the relevant code is this:

## Enable networking
log "Setting up chroot networking"
cd host
mkdir -p etc dev proc sys
cp /etc/resolv.conf etc/external-resolv.conf
for fn in dev dev/shm dev/pts proc sys; do mount --bind "/$fn" "$fn"; done

## Patch the ISO for local chroot
log_start "Looking for NixOS init... "
INIT=$(find . -type f -path '*nixos*/init')
log_end "$INIT"

If any of those mounts have not yet finished, though the command is run synchronously, you may fail the find. In your case, it looks like mounting proc too extra long.

Are you able to continue testing this to see how often you can reproduce it? If so, I would recommend changing that mount --bind to mount --bind -o sync and see if the issue goes away. In the meantime, I can further research solutions.

jeaye commented 7 years ago

After some more research and question asking, it looks like the race is not to do with the mounts, but just to do with find being racy to begin with. It does have some flags to handle issues like this, and I think we can exclude proc from the search entirely as well.

In short, this issue happened because, as you were running find, the process 1902 was destroyed and its file was removed, but find still tried to open it. So, this has nothing to do with the NixOS installation in particular and should be easy to mitigate; I'll leave this open for now and push a fix tonight.

Thanks for reporting this in such a helpful fashion!

srid commented 6 years ago

Just hit this with Debian Stretch on a OVH dedicated server. Repro'ed 2nd time as well. Adding a sleep 2 before the find fixed it for now.