sandstorm-io / blackrock

Cluster management
Apache License 2.0
74 stars 13 forks source link

Quota weirdness on oasis #49

Open zenhack opened 7 years ago

zenhack commented 7 years ago

So, looking at my Oasis account right now, the grain page tells me that my quota usage is:

0 of 100 grains -2.19e+7B of 1.20GB used

Looks like floating point weirdness. Screenshot:

2017-07-24-173811_1280x800_scrot

(I noticed this having just deleted a Piwik grain I had there, which I hadn't really been using).

kentonv commented 7 years ago

Huh. It appears there are several users who have negative storage usage. This is probably a Blackrock bug.

Does it fix itself if you create a grain (or do something else that updates storage usage), or do you effectively have 22MB of extra quota forever?

zenhack commented 7 years ago

So first, I went back to look and my quota was at 4.10KB, without me having touched the account since opening this issue. To test, I created a wekan grain and then deleted it. The quota usage went up to 4.8MB, and when I deleted it and then emptied the trash, it went down to 4.36MB. There are no other grains.

zenhack commented 7 years ago

and now it's back down to 4.10kB, again without me having touched anything.

kentonv commented 7 years ago

4.10kB (or, more precisely, 4096 bytes) is the expected usage for an empty account (one disk block).

I think the confusion here comes from the two different ways that disk usage is accounted on blackrock. The back-end keep track of the exact number of disk blocks being used by each user, which it can do because each grain's storage is actually a private ext4 filesystem inside a sparse file. The front-end is able to query the back-end to ask "How many bytes is user X using?", and the back-end can reply quickly. There's currently no way, though, for the front-end to subscribe to updates to this value; it has to poll. So it does that at various key moments.

Meanwhile, the size of each individual grain is not tracked by querying the back-end. It could be, but only under Blackrock; on single-machine Sandstorm there's no way to ask this question. So instead when a grain is running, the supervisor watches the grain's filesystem tree with inotify and tries to keep a running count of the size, which it streams back to the front-end. This is where the size of each grain comes from in the grain list.

If you delete a grain, the front-end adjusts your overall storage usage at that moment according to the size as it appeared in the grain list, i.e. the file-level size.

However, there is often a difference between the file-level size vs. the block-level size. Usually the file-level size is less that the block-level size due to not counting things like filesystem metadata and the journal. But it can also be larger, mostly because it's just not that easy to determine block device usage based on observing a file tree, especially if that tree is actively changing.

When you observe your usage jump to 4.8MB with one Wekan grain, I suspect that most of that was the filesystem journal. When you deleted the grain, the front-end only credited you back the file-level size of 340kB, which didn't count the journal, so you ended up with 4.36MB used. Later, though, it queried the back-end for your overall usage and the back-end reported just 4096.

I guess one way we could fix this would be to ask the back-end for accurate counts more often.

On single-machine Sandstorm, there actually is no way to ask the back-end for the total usage, so quota is always based on the sum of your grains' file-level usage. It's probably somewhat inaccurate, but at least it doesn't suffer from trying to harmonize two very different approaches to counting disk usage.

zenhack commented 7 years ago

Might make sense to query just after deleting a grain, and other operations likely to cause big jumps.