Handle "no space left on device" scenario more gracefully

zonque commented 5 years ago

Version information:

go-ipfs version: 0.4.19-dev-852a8db40 Repo version: 7 System version: amd64/linux Golang version: go1.11.2

Type:

Enhancement

Description:

When the file system ipfs stores its block on runs out of space, the daemon rightfully reports ERROR bitswap: Error writing block to datastore: write /home/daniel/.ipfs/blocks/UT/put-335992698: no space left on device bitswap.go:331 and a file down retrieval stalls.

I think this should be handled more gracefully by trying to free some old objects and then try again. A full GC seems overkill, and freeing unpinned objects to make room for the new one is more appropriate.

The background of this is that I'm trying to use ipfs for exchanging files in a fleet of embedded devices where ipfs blocks are stored in a tmpfs which is limited in size. As each of these embedded nodes should act as data provider to others, a full GC would remove too much of the blocks that are still be useful to others.

eingenito commented 5 years ago

Hey @zonque. Given your use case I get it - basically you'd like to fill up the space available to IPFS and keep it filled but still prioritize new blocks over anything already in the datastore? Maybe evicting blocks based on age? It's a reasonable enhancement but I think it's a fair bit of work. I don't know for sure but I don't think we keep track of the age of blocks right now.

Also no space left on device is a reasonable thing to bounce up against in your use case because you're using a separate file system, but obviously in many IPFS installations that's not true, and no space left on device is way worse than just IPFS can't store a block. Even for IPFS we're in the middle of doing something that we can't easily back out of.

Which is to say we're happy to take a look at a PR if you want to work on it, but my guess is it's not going to be a feature that would show up in IPFS in the natural course of things for a while. This is just my opinion - someone else reading may have the same need and get psyched to work on it.

I guess right now you might get slightly better behavior by tuning StorageMax to the size of your tmpfs and then at least IPFS will trigger a GC and you can keep doing work. And yah, I get that you've already thought of that thus the issue.

Thanks.

zonque commented 5 years ago

Given your use case I get it - basically you'd like to fill up the space available to IPFS and keep it filled but still prioritize new blocks over anything already in the datastore? Maybe evicting blocks based on age? It's a reasonable enhancement but I think it's a fair bit of work. I don't know for sure but I don't think we keep track of the age of blocks right now.

Correct. Tracking block age and evicting the oldest one would be one solution. Picking one that hasn't been accessed for a long time is another approach. I would have just picked a random one for simplicity.

To elaborate on my use case a bit more - I'd like to use ipfs as a decentralized update delivery method, where many peers in a swarm provide blocks to each other. The machines are limited in resources, so I thought about storing the ipfs blocks on a tmpfs.

With a mechanism like this implemented, the block storage could even be smaller than the update that is being delivered, which would really be nice.

I guess right now you might get slightly better behavior by tuning StorageMax to the size of your tmpfs and then at least IPFS will trigger a GC and you can keep doing work. And yah, I get that you've already thought of that thus the issue.

The problem with GC is that it removes too much. Any peer hitting the limit will hence lose its capability to serve, which is not good. But worse, running a GC does not currently help a parallel transfer that is stuck because the daemon had ran out of disk space. And for an update method, this is a worst case scenario.

Which is to say we're happy to take a look at a PR if you want to work on it, but my guess is it's not going to be a feature that would show up in IPFS in the natural course of things for a while. This is just my opinion - someone else reading may have the same need and get psyched to work on it.

Could you sketch out which parts of the project would need to be tweaked to implement something like this? I'm not familiar with the internals of this project at all at this point.

Thanks!

zonque commented 5 years ago

I've dug into this a bit, and my current understanding of the situation is as follows.

The code handling the on-disk block store lives in go-ds-flatfs which implements interfaces defined in go-datastore.
The Datastore struct type is inherently unaware of the usage of the blocks it stores, which is why the garbage collector is in fact implemented in the top-level IPFS project.
The GCDatastore.CollectGarbage function is handled but not actually implemented in any datastore type.
When a message arrives through the bitswap message handler, the datastore is instructed to store the incoming block, and when it fails, the block is discarded.

In order to implement a datastore that tracks the (access) age of its blocks and deletes the oldest unpinned ones to make room for at least the size of the incoming message (much like in a GC fashion) we would need a hook of some sort for the bitswap protocol to instruct the core to make an attempt of freeing up space, and then try again to save the block. Also, the core would probably need to keep a list of unpinned blocks so it doesn't have to build that RB tree on each invocation of that handler.

Now, I'm wondering how this could possibly be implemented. Would the GC code need to be relocated into a more advanced datastore implementation maybe?

eingenito commented 5 years ago

@zonque I'm sorry I can't add much to your efforts; a lot of the work you'd be doing has never been considered. At a superficial level I would say it would be great if bitswap didn't know about the behavior of the storage subsystem. It just tries to store blocks - it shouldn't ask for retries or manage the freeing of space.

The CollectGarbage I think is implemented when a datastore needs to perform additional operations on GC completion (like https://github.com/ipfs/go-ds-badger/blob/4a093545f2f6a069ddf7765f0a994bb38105288b/datastore.go#L197) so it may not quite be what you're looking for.

I honestly can't think of a great way to achieve what you're looking for with the components factored as they are. Datastores are pretty dumb and don't know the relative value of the content they store. GC is pretty dumb and just knows about pinned or not. What you're trying to introduce is new.

zonque commented 5 years ago

@eingenito Thanks for elaborating.

I agree that the bitswap layer should not know about the internals of the datastore, and also keeping the datastore unaware of the nature of the data it stores makes total sense. Hence, I think there are two things needed to achieve what have in mind.

The datastore would need to get a callback function that is invoked by the store when it runs into a situation where it needs its user to free up space, either because its own accounting exceeded a configured threshold, or because the OS reported "out of space". To begin with, the flatfs implementation would be the only one that invokes that callback.
When a datastore calls that callback, the ipfs core needs to conduct a cleanup that frees up at least as much space as requested, but not all unpinned objects (this is what GC currently does, and it prevents that node from serving any cached content). For that, the core needs to keep track of access times of objects so finding the oldest one becomes cheap. I think this relates to #5515

The latter is certainly more difficult to achieve than the former of course, but if we had that, I reckon invoking such a prune function should be straight forward.

eingenito commented 5 years ago

I think I get what you're suggesting. I wonder if it would be possible/easier to implement this entirely within a blockstore, either a wrapper around an existing one (which I think is pretty common) or as a derivative of an existing one). In this layer you could soft delete whenever you got a DeleteBlock call and you could record the Cid of the delete request basically remembering that that Cid is fair game for deletion (since at some point in the past it was going to be deleted, probably by GC). You'd have to remove the key from the deletable list on a Get, Put or Has because you can't know about any pinning that might be going on after the call to DeleteBlock.

Then when you get an error indicating that you're out of disk space you can consult your deletable list and just delete until you have room. You could still maintain LRU data if you wanted to in your deletable list so you could choose the blocks that had been used least recently to delete.

ipfs / kubo