ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.11k stars 3.01k forks source link

Store pins in MFS #4675

Open Stebalien opened 6 years ago

Stebalien commented 6 years ago

Motivations:

Blockers:

kevina commented 6 years ago

@Stebalien I assume you already know this, but right now anything under the MFS is pinned via a best-effort policy. What that means is that it won't be gc, but removing it or any of its children won't be blocked.

Maybe a recursive pin could be implemented as a sort of read-only flag. That is removing anything with the flag will be blocked until the flag is removed, although there are probably a lot of implementation details that need to be worked out, for example the handling of indirect pins.

Now direct pins don't really fit into this model as the gc is free to remove any direct pins children.

Stebalien commented 6 years ago

For context, one of my goals here is to make IPFS usable for DAPPs. For this to happen, we need to be able to have namespaces and I'd like to use MFS for that (a nice, simple, single namespace).

What that means is that it won't be gc, but removing it or any of its children won't be blocked.

You mean by ipfs block rm? It would be nice to split this into "always delete X" and "gc X if uneeded". The current middle-ground seems a bit weird. The two usecases I can see are:

  1. Free memory (gc).
  2. Remove bad bits (force remove).

Proposal:

Now direct pins don't really fit into this model as the gc is free to remove any direct pins children.

Yeah. I think we'd need some way to associate pin/prefetch information with directory entries to do this. However, this is generally useful in unixfs so I think we'll want it anyways (for smart GC/prefetching).

kevina commented 6 years ago

You mean by ipfs block rm?

Well that and ipfs files rm.

The current middle-ground seems a bit weird.

It was a compromise that me and @whyrusleeping agreed on. The problem was before that anything inside the MFS would of been garbage collected since it wasn't pinned. A recursive pin was not appropriate because a recursive pin implies the entire dag is locally available, which is not always the case for something under the MFS root. For example only some of the directory entries may be available locally. Thus the best-effort pin was created in which the GC will keep anything it can reach from the MFS root but won't complain if some of the children or missing. It is called best-effort because some children which are part of the dag may be unintentionally removed if any of the internal nodes pointing to the child are not available locally.

Stebalien commented 6 years ago

Well that and ipfs files rm.

Wouldn't that be equivalent to ipfs pin rm?

best-effort MFS

I actually prefer MFS's reachability approach. The "middle-ground" I was talking about with ipfs block rm was that users will likely want to do one of:

  1. Free memory (gc).
  2. Remove bad bits (force remove).

And ipfs block rm does a bit of both.

kevina commented 6 years ago

Well that and ipfs files rm.

Wouldn't that be equivalent to ipfs pin rm?

In a way I guess since the block is not actually removed from the local repo, sorry I momentarily blanked on what ipfs files rm does.

I actually prefer MFS's reachability approach.

Except that if something is corrupted (for example a block is missing) the GC could accidentally remove important data, that is why the GC aborts if any part of a dag of any the recursive pins is not available.

And ipfs block rm does a bit of both.

Actually it doesn't force anything. It checks if a block is pinned and will refuse to remove it. There is no way to force remove a pinned file; you will first have to unpin it.

Kubuxu commented 6 years ago

This would also mean loss of direct pins feature. If we are ok with it it would be quite nice as it would simplify GC code.

Stebalien commented 6 years ago

@Kubuxu We can do this but we'd need some concept of "pin policies" in unixfs.

Stebalien commented 6 years ago

I actually prefer MFS's reachability approach.

Except that if something is corrupted (for example a block is missing) the GC could accidentally remove important data, that is why the GC aborts if any part of a dag of any the recursive pins is not available.

There are really two ways I can see this happening and I don't think it'll be much of an issue:

  1. Corruption. Detect this in the datastore and block writes until we do some form of fsck. That is, I'd consider this to be a problem at a different layer. (given that we'd now be in "data recovery mode", we could also just auto-pin all unaccounted for blocks in some "lost+found" like fsck does.)
  2. The block was force removed. We can always warn users, provide an option to store the children in some "lost+found" folder, etc. This is an explicit action taken by the user so we have a lot of options to prevent users from shooting themselves in the foot.

Basically, users are used to filesystems so I'd prefer to just give them filesystem semantics.

And ipfs block rm does a bit of both.

Actually it doesn't force anything. It checks if a block is pinned and will refuse to remove it. There is no way to force remove a pinned file; you will first have to unpin it.

I consider MFS and pinning to be two ways to "keep" blocks from being GCed. However, ipfs block rm respects pins but not MFS. I'd rather it either respect both or neither (and/or have some form of --force flag).