osbuild / osbuild-composer

An HTTP service for building bootable OS images.
https://www.osbuild.org
Apache License 2.0
160 stars 108 forks source link

Can't delete composes when disk is full (goroutine panics) #927

Open major opened 4 years ago

major commented 4 years ago

I was doing some testing in AWS and the disk filled up from composes being uploaded from the worker to the composer instance. I tried to delete some composes to save space, but all calls for deletions failed. The following appeared in the system journal:

2020/08/18 16:20:07 DELETE /api/v1/compose/delete/fa491103-d97a-445b-baa4-6c27f28939ad
2020/08/18 16:20:07 http: panic serving @: open /var/lib/osbuild-composer/state.json-179214357.tmp: no space left on device
goroutine 549 [running]:
net/http.(*conn).serve.func1(0xc000098fa0)
    /usr/lib/golang/src/net/http/server.go:1767 +0x13b
panic(0x556d3608ef60, 0xc000089380)
    /usr/lib/golang/src/runtime/panic.go:679 +0x1b6
github.com/osbuild/osbuild-composer/internal/store.(*Store).change(0xc0049294a0, 0xc0000fd6a0, 0x0, 0x0)
    /builddir/build/BUILD/osbuild-composer-5d7aa2f6999ecf022dbb5d282cba0b7bb07fd394/_build/src/github.com/osbuild/osbuild-composer/internal/store/store.go:115 +0x17a
github.com/osbuild/osbuild-composer/internal/store.(*Store).DeleteCompose(0xc0049294a0, 0x5b447ad9031149fa, 0xad3989f2276ca4ba, 0x556d36115260, 0xc004ec2680)
    /builddir/build/BUILD/osbuild-composer-5d7aa2f6999ecf022dbb5d282cba0b7bb07fd394/_build/src/github.com/osbuild/osbuild-composer/internal/store/store.go:399 +0x5f
github.com/osbuild/osbuild-composer/internal/weldr.(*API).composeDeleteHandler(0xc004eea100, 0x556d36111340, 0xc0001408c0, 0xc00251c500, 0xc0010be3c0, 0x2, 0x3)
    /builddir/build/BUILD/osbuild-composer-5d7aa2f6999ecf022dbb5d282cba0b7bb07fd394/_build/src/github.com/osbuild/osbuild-composer/internal/weldr/api.go:1897 +0x5f2
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc000068040, 0x556d36111340, 0xc0001408c0, 0xc00251c500)
    /builddir/build/BUILD/osbuild-composer-5d7aa2f6999ecf022dbb5d282cba0b7bb07fd394/_build/src/github.com/osbuild/osbuild-composer/vendor/github.com/julienschmidt/httprouter/router.go:334 +0x960
github.com/osbuild/osbuild-composer/internal/weldr.(*API).ServeHTTP(0xc004eea100, 0x556d36111340, 0xc0001408c0, 0xc00251c500)
    /builddir/build/BUILD/osbuild-composer-5d7aa2f6999ecf022dbb5d282cba0b7bb07fd394/_build/src/github.com/osbuild/osbuild-composer/internal/weldr/api.go:160 +0x144
net/http.serverHandler.ServeHTTP(0xc0024fc1c0, 0x556d36111340, 0xc0001408c0, 0xc00251c500)
    /usr/lib/golang/src/net/http/server.go:2802 +0xa6
net/http.(*conn).serve(0xc000098fa0, 0x556d36112040, 0xc0000699c0)
    /usr/lib/golang/src/net/http/server.go:1890 +0x877
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:2927 +0x390
major commented 4 years ago

If I delete one manually from the /var/lib/osbuild-composer/artifacts and kill osbuild-composer, I can delete composes again.

teg commented 4 years ago

I seriously doubt we can do anything once the disk is already full. We could possibly avoid that from happening hough...?

major commented 4 years ago

I wonder if we can avoid making the temp file for the atomic write in this situation? Or perhaps we can stop accepting data when there's less than 50MB left or something like that?

larskarlitski commented 4 years ago

I wonder if we can avoid making the temp file for the atomic write in this situation?

I don't think this would be a good solution. The way it is now, we at least keep the state consistent and tell the client that deleting the compose did not work. Inconsistent state would (potentially) break all future access to composer, even after disk space is freed.

Or perhaps we can stop accepting data when there's less than 50MB left or something like that?

I like this. I think @dvdhrm and @teg were talking about this in the context of the osbuild store before.

Note that the koji API doesn't write state when deleting a compose.

teg commented 4 years ago

I think if we run out of space, all bets are off anyway, so not sure it makes sense to put too much effort into mitigating this... Other services will likely also randomly fail.

bcl commented 4 years ago

I would expect the user to notice the low disk situation via the cockpit dashboard. But it might also be good to require a minimum of free space available when starting a compose since we know, approximately, how much space will be needed. We should encourage users (other than us running tests ) to operate with a wide margin of free disk space.