guysoft / CustomPiOS

A Raspberry Pi and other ARM devices distribution builder
GNU General Public License v3.0
514 stars 149 forks source link

Debian 10 (Buster), can't mount loop devices #55

Open WheresWaldo opened 5 years ago

WheresWaldo commented 5 years ago

When using Debian 10 as the host to build a CustomPiOS, pulling the latest stable release of Raspbian (Buster) the build script fails when it gets to mounting the loop devices. After the script fails, the following loop devices exist:

loop0 loop1 loop2 loop3 loop4 loop5 loop6 loop7 loop-control

WheresWaldo commented 5 years ago

Update:

Edited /etc/modules to add the line loop max_loop=128 and then reran my build script. It still fails creating loop devices but the output of /sbin/losetup -a is :

/dev/loop0: []: (/home/waldo/CommunityOS/src/workspace/2019-07-10-raspbian-buster-lite.img), offset 276824064

guysoft commented 5 years ago

Are you building on vanilla Debian 10 or Rasbian? I am not sure what how to reproduce this other than using vagrant.

WheresWaldo commented 5 years ago

Building directly on vanilla Debian 10 in an Oracle VM on Windows 10.

Even after pulling only the devel branch (using -b devel) with the git clone command and rerunning my script it still fails when trying to mount the loop devices. I am including the build.log so you can see what is happening. build.log

WheresWaldo commented 5 years ago

I believe I may have solved this.

What didn't work was to create an entry in /etc/module for loop. What worked was to create a configuration file for modprobe, /etc/modprobe.conf, then add the entry options loop max_loop=128. Save the file, then when modprobe is executed with loop the script is able to create more loop devices.

I tried this out in two distinct virtual machines (remember I am running VirtualBox on a Windows 10 host). The first was an upgraded VM from Debian 9 (stretch) to Debian 10 (buster). That one was hard, because it worked fine in Debian 9 but failed after the upgrade. The other VM was a fresh install of Debian 10, which also failed to create loop devices.

I have not tested whether or not the image actually runs on an rPi, but building on Debian does complete without errors. I need to purchase another rPi, since I am now using the three I own in other projects and I can't sacrifice my current OctoPrint setup for testing.

I am closing this issue.

guysoft commented 5 years ago

This is worth documenting in the wiki

WheresWaldo commented 5 years ago

First, I spoke too soon. only the upgraded VM, which worked fine using Debian 9 (stretch) works with the above file added after upgrading to Debian 10 (buster). I still can't get a new VM of Debian 10 (buster) to build. It still fails at trying to mount loop devices.

I found other inconsistencies between building in buster vs stretch and may open other issues, unless I can find work-arounds.

WheresWaldo commented 5 years ago

So the behavior is odd, Debian is very temperamental with regard to loop devices. If I come from a clean boot, I can build an image. If I try to build it again in the same session it fails saying that loop is unavailable. At the very least maybe the cleanup or error routine should check for loop devices and remove them explicitly.

I am reopening since this really isn't solved.

guysoft commented 5 years ago

you can also try using rhe vagrant build method On Aug 5, 2019 03:09, "WheresWaldo" notifications@github.com wrote:

So the behavior is odd

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/guysoft/CustomPiOS/issues/55?email_source=notifications&email_token=AACPQJWLFJVWSOTIW3KD53TQC5VUTA5CNFSM4IIGVI32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3QMPUA#issuecomment-518047696, or mute the thread https://github.com/notifications/unsubscribe-auth/AACPQJWQUYU6ZKXNIWQV6NDQC5VUTANCNFSM4IIGVI3Q .

WheresWaldo commented 5 years ago

I have never used that method, so I will need to study it to see how to set it up.

guysoft commented 4 years ago

Just updating I a getting this two, both on my arm64 raspbian installation and also in ZynthaianOS project. Its random - as in sometimes it happens, sometimes it doesn't. On my install it seems always to work after it fails. Investigating but would always love help with this, since its not 100% reproducible, I have to wait and try catch it.

WheresWaldo commented 4 years ago

I haven't built another OS in a while, I settled on Stretch for my PiOS since I could never get Buster to work. And on another note, the other guy who was working on a rewrite of the RoboOS user interface stopped working on it. My project is dead now, for all intents and purposes, since I am not a programmer, just capable of assembling tools is all.

guysoft commented 4 years ago

Sorry to hear that :(

Will stress that for me the issue is random. OctoPi nightly fail rate is about 15% from this issue over the past month. So its annoying but should not stop you from building altogether.

guysoft commented 4 years ago

Looks like there is an open issue on Docker https://github.com/moby/moby/issues/27886

guysoft commented 4 years ago

Possible workaround: https://github.com/moby/moby/issues/27886#issuecomment-571940507

Need to reproduce to try and fix it

Fulg commented 4 years ago

FYI. I can reproduce this issue fairly easily/regularly with a fresh Ubuntu 19.10 VM, both with and without Docker. The behavior is the same, it works most of the time but randomly I will get loop device errors.

The random nature of the issue makes it super hard to diagnose, I was about to comment yesterday that I had found a workaround for the loop device issues in Docker but eventually it failed again the same way, so it turned out it was down to pure luck and not actual fixes.

Anyway, just wanted to comment that this isn't related to Docker, since I can reproduce the same behavior without using Docker at all.

FYI, yesterday I found an interesting commit (again out of luck!), maybe this will be worked around in the Linux Kernel after all: https://github.com/brauner/linux/commit/be3dddcfc05aaf9b48329f082bf4db6955980a62

mikevanis commented 4 years ago

What's your usual setup for using CustomPiOS, @guysoft? Would be helpful as a starting point. Do you build on a Pi under raspbian, or in Ubuntu?

guysoft commented 4 years ago

@mikevanis I have moved to use a raspberrypi running docker. It has the 64bit kernel set. And you can see a photo of it here: https://twitter.com/GuySoft/status/1188735297758138375

Has a HDD for faster runtime.

The docker service works more or less like this:

https://github.com/guysoft/CustomPiOS/wiki/Building#quickstart---i-want-to-build-a-new-distro-using-docker

The distro the Pi running is Raspbian

However, I also help maintain the Zynthian.org build, which is sitting on an AWS EC2 machine running docker there. They are using Debian 9, that also works.

mikevanis commented 4 years ago

Thanks @guysoft, that really helped. I can confirm that I don't get this issue on a clean Raspbian Buster image. I did the following:

# install docker and docker compose
curl -sL get.docker.com | sed 's/9)/10)/' | sh
sudo usermod -aG docker pi
sudo apt-get install -y libffi-dev libssl-dev python3 python3-pip
sudo apt-get remove python-configparser
sudo pip3 install docker-compose

# clone CustomPiOS
git clone https://github.com/guysoft/CustomPiOS.git

# make custom pi os
CustomPiOS/src/make_custom_pi_os -g ~/ExampleDistro

# use docker-compose.yml from CustomPiOS wiki
vim ExampleDistro/docker-compose.yml

# up the container
cd ExampleDistro && sudo docker-compose up -d

# build!
sudo docker exec -it mydistro-build build

Here's what my docker-compose.yml looks like. I had to change the volume mounting so that it starts on the right directory:

version: '3.6'

services:
    custompios:
        image: guysoft/custompios:devel
        container_name: mydistro-build
        tty: true
        restart: always
        privileged: true
        volumes:
        - ./src/:/distro
        devices:
        - /dev/loop-control

Thanks all for the help and hints

guysoft commented 4 years ago

Cool, I guess in the future you would also use https://github.com/guysoft/UbuntuDockerPi , which ships with docker + docker-compose :)

If you could point out what was different from your last setup it might help others in this issue.

Fulg commented 4 years ago

Could it be that the Pi simply isn't fast enough to trigger the issue? From my notes it seems that the errors point to a timing issue with the loop device:

1) losetup: /dev/loop14: detach failed: No such device or address 2) mount: /distro/workspace/mount/: failed to setup loop device for /distro/workspace/2020-02-13-raspbian-buster-lite.img. 3) losetup: 2020-02-13-raspbian-buster-lite.img: failed to set up loop device: Resource temporarily unavailable

1 and 2 are the most common versions of the problem. 3 is quite rare but more telling of the actual problem IMO.

The problem seems more likely to happen on small projects (like a bare distro out of src/make_custom_pi_os -g).

guysoft commented 4 years ago

@Fulg quite possible. I was considering to build a distro, that builds other distros. It could ship with Jenkins and have CI/CD nightly builds, you just give it a github repo and it builds it. Then we could test timing scenario like that with similar hardware and software. Never had time for that though.

amrsoll commented 4 years ago

@guysoft I understand that you much prefer using docker images to build the images, but I see the wiki mentions Raspbian as a host system to build with out-of-the-box and I remember making it work on the previous buster image some time ago! :smile:

I tried again straight from the vanilla (latest to date) raspios-lite image on a RPi 4 but failed :(

sudo apt install -y git p7zip-full python3 lsof
git clone https://github.com/guysoft/CustomPiOS.git
CustomPiOS/src/make_custom_pi_os -g mytest
sudo modprobe loop
mytest/src/build_dist

Finishes with the infamous

losetup -f --show -o 272629760 2020-02-13-raspbian-buster-lite.img
losetup: 2020-02-13-raspbian-buster-lite.img: failed to set up loop device: Permission denied

I have also tried build-dist with the raspios image with the same result

Issue #46 details the same problem on the Ubuntu host because of overused loop devices, but losetup returns nothing in my case.

build.log

Note: I am building it straight on the SD card

guysoft commented 4 years ago

@amrsoll This is not related to this issue, you are not running with root permissions and you are getting a permission error. Run sudo mytest/src/build_dist The reason I am moving mostly to support docker is scenario where I have no idea what is going in the host system, it lets you keep track of the environment. Build on Raspbian/Debian/RaspberryPi OS should work.

toddejohnson commented 4 years ago

I build in DigitalOcean using a:

doctl compute droplet create rmspi-build --size 1gb --image debian-10-x64 --region nyc1

Experienced this on my first attempt, re-try would succeed, 3rd would fail, 4th succeeded. Log is gone now but I can spin this up and give it a try if you need it.

In dmesg I saw:

loop_set_status: loop0 () has still dirty pages (nrpages=30)

This is my build script I hacked together.

sudo modprobe loop

apt update
apt install p7zip-full qemu-system-arm qemu-user-static backblaze-b2 git curl

cd /usr/src/
git clone https://github.com/toddejohnson/rmspi.git RMSPi
git clone https://github.com/guysoft/CustomPiOS.git CustomPiOS
echo "/usr/src/CustomPiOS/src" > /usr/src/RMSPi/src/custompios_path
CURRENT_RASPBIAN=$(curl https://downloads.raspberrypi.org/raspbian_lite/images/ | grep raspbian | tail -n 1 | awk -F "href=\"" '{print $2}' | awk -F "/" '{print $1}')
CURRENT_RASPBIAN_FILE=$(curl http://downloads.raspberrypi.org/raspbian_lite/images/${CURRENT_RASPBIAN}/ | grep .zip | head -n 1 | awk -F "href=\"" '{print $2}' | awk -F "\">" '{print $1}')
curl -L -o "/usr/src/RMSPi/src/image/${CURRENT_RASPBIAN_FILE}" https://downloads.raspberrypi.org/raspbian_lite/images/${CURRENT_RASPBIAN}/${CURRENT_RASPBIAN_FILE}

cd /usr/src/RMSPi/src

./build_dist

cd /usr/src/RMSPi/src/workspace
CURRENT_IMG_FILE=$(basename $CURRENT_RASPBIAN_FILE .zip)
OUT_IMG_FILE=$(basename $CURRENT_RASPBIAN_FILE .zip| cut -d'-' -f -3)-rmspi-$(basename $CURRENT_RASPBIAN_FILE .zip| cut -d'-' -f 5-)-0-4
mv ${CURRENT_IMG_FILE}.img ${OUT_IMG_FILE}.img

7za a ${OUT_IMG_FILE}.zip ${OUT_IMG_FILE}.img

I used to use and tried to start with ubuntu-18-04-x64 although I had similar errors, gave up and decided Debian would be a better fit for this.

guysoft commented 2 years ago

Hey, So possible workaround.

One that didn't work: I tested running ls from the host and container (wrote a flask server to run ls and then pulled that from the container, no luck)

One I am going to test now: This commit https://github.com/mmmspatz/pi-gen/commit/0c84b0daea2c358adbccf737f2214e2e4808f597 based on this: https://github.com/moby/moby/issues/27886#issuecomment-749546846 , suggests that adding to the docker compose /dev:/dev as a volume fixes this. Will update.

mmmspatz commented 2 years ago

Hi, lacking some context here but here's what I learned:

docker run --privileged gives the container access to everything in the host's /dev/, but new device nodes created in the host after the container is launched don't automatically show up in the container. So when you run losetup -f in the container, you'll see a new /dev/loop[n] appear in the host but not in the container, then losetup will fail. Mounting the host's /dev in the container fixes this and solves the problem.

A thought I have is: maybe you could create your loopback devices in the host before launching the container, then make use of them after launch? idk.

guysoft commented 2 years ago

@mmmspatz that's what should happen, but there is a bug that it doesn't always update correctly in the container. With your commit I suspect that performing a bind mount to /dev might get it to update correctly. Possibly because a bind mount to a folder updates differently.

mikevanis commented 2 years ago

Not sure if this is helpful, but over on the My Naturewatch Camera repo we use CustomPiOS to build the camera image with GitHub Actions. I used to get this error sometimes when building images on a Pi 3B, but we haven't seen it happen in a while.

guysoft commented 2 years ago

@mikevanis Indeed the github actions builds I have here don't get that error neither. I don't know why.

Also Naturewatch is cool! I recently released and also going to blog post about the rpi-imager fork I wrote that can accept community-maintained images: https://github.com/guysoft/pi-imager

If you want to PR a .json file to this repo, would love to have that added to the list: https://github.com/guysoft/pi-imager-web/ I was considering it to message you about that. But if you are already here thought I might as well suggest.

guysoft commented 11 months ago

Just updating that this is still an issue. The workaround that I use ATM is to docker stop/docker start the container.