ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.83k stars 2.96k forks source link

Add fixed-period and conditional repo GC #1495

Closed rht closed 8 years ago

rht commented 8 years ago

WIP sketch.

  1. Ok with configurable GC frequency?
  2. go-datastore doesn't have size check, so a local system call is used instead

Address #972

Luzifer commented 8 years ago

I like this. :green_heart:

Suggestions:

  1. Let the user decide how much space it may consume
  2. Let the user decide how often to check (optional? not sure if really used)
  3. Maybe add an option to run GC even when size is not exceeded?
rht commented 8 years ago
  1. I will put this into config once there is consensus on how the parameters are defined. This could even default to e.g. 50% of remaining free space.
  2. That's too low level. Frequency is configurable only when there isn't an algorithm to adapt the freq to the system resource level.
  3. For now ipfs repo gc?
rht commented 8 years ago

2.. Although not necessarily for nix https://nixos.org/nixos/manual/sec-nix-gc.html

jbenet commented 8 years ago

@rht solid first pass! :+1: this is headed in the right direction. some comments above o/

rht commented 8 years ago

@jbenet RFCR, things have been ironed out.

jbenet commented 8 years ago

@rht much better! almost there, i think. a few things above o/

cc @whyrusleeping for his feedback

rht commented 8 years ago

@whyrusleeping @jbenet like this?

jbenet commented 8 years ago

@rht this looks very good to me! one minor nit above (re, GCTimeout).

Ok, we should have tests. too. we should test things like:

but probably more things. this is an important piece of functionality, with dangerous consequences if we don't do it right

one thing i'm noticing is that doing it based on time, may not be the best if the user wants to run a command. it should be possible for commands issued to trigger gc to run, even if gc isnt scheduled to run yet. maybe the periodic timer should issue a routine that also gets called after every command is run. if storage < watermark, it skips everything, so should not impact perf too much. and should make it possible to constantly: ipfs cat things (not pinned) to your heart's content without worrying about storage limits.

rht commented 8 years ago

We should have a way to express no limit. (0?) and default to that

This is --disable-gc, but I will also set "0" to be parsed as no gc. Either both exist, or only one exists.

rht commented 8 years ago

Either ipfs pin rm -r or corerepo.GarbageCollect is unclean.

unpinned QmVGz1kZQYtmaLZgEQgzR2MKaM2WGfaGwTisSg6jHvhX4F
> diff -u expected actual
--- expected    2015-07-24 07:32:21.543585774 +0000
+++ actual  2015-07-24 07:32:22.673585788 +0000
@@ -1 +1 @@
-906108
+922492

There is a ~16KB residue in $IPFS_PATH/blocks. #1515

rht commented 8 years ago

 [ ] gc is triggered at watermark.

There is no direct way to check this. Sharness scripts had never checked against eventlog before, which means this is too granular for sharness.

rht commented 8 years ago

it should be possible for commands issued to trigger gc to run, even if gc isnt scheduled to run yet. maybe the periodic timer should issue a routine that also gets called after every command is run.

Hook this into add/cat/get/pin/object/block? If file size exceeds slack space (storageMax * (100-watermark)/100), then run gc, before running command. This is definitely most robust (because it covers all the doors).

The middle ground (for server?) is to use some stats / bandwidth_max to adjust GCTimeout and StorageGCWatermark during runtime in a "control system" way.

But coverage is more important than runtime cost (each size check is cheap (relative to?)) here.

rht commented 8 years ago

checking the dust (to be swept)

( repeat 1000; do; du -sb plan9.pdf > /dev/null; done; )  0.31s user 1.05s system 86% cpu 1.569 total

adding more dust (or star stuff)

( repeat 1000; do; ipfs add plan9.pdf > /dev/null; done; )  111.59s user 14.66s system 95% cpu 2:12.82 total
rht commented 8 years ago

NOTE: this calc misses the mark. e.g. totally doesn't make sense to add var68(I) into the watermark. The model doesn't take into account of residual "water level" after each GC period. Will be updated.


Given: p := pin size. model: v(t, b, f, ...) := rate of block increase with measured param b = size/add, f = cat freq, ... Assume p changes very slowly.

g := GCTimeout w := StorageGCWatermark m := StorageMax

let I = <integral 0 to 1/g v(t, b, f, ...) dt> ~ mean of the v distribution

then given the 2 constraints

  1. w - p = I
  2. m - p = I + var95(I) #todo: this should be residual distribution using conditional probability or something where var68 is standard deviation of I, var95 is deviation to cover 95% of the case. e.g. for poisson process, len(...) is 0 and var68 = I, and var95 = 1.96*I (todo: should be truncated poisson)

which can be rewritten as

  1. w = m - (var95 - var68)
  2. m = p + I + var95(I) (if m is specified, then g is constrained)

Only 1 param needs to be specified (StorageMax?) if b and f are somehow measured (or throttled).

jbenet commented 8 years ago
  • [ ] gc is triggered at watermark.

There is no direct way to check this.

what abou something like:

<set max s.t. watermark ~10.5MB

for i in $(seq 10)
do
  random 1M (<seed> + i) | ipfs add -q | tail -n0 | ipfs pin rm -r
done

<check du almost at watermark with stat>

random 1M (<seed> + 11) | ipfs add -q | tail -n0 | ipfs pin rm -r

<check du is around ~1MB>

Hook this into add/cat/get/pin/object/block? If file size exceeds slack space (storageMax * (100-watermark)/100), then run gc, before running command. This is definitely most robust (because it covers all the doors).

Sounds good. this is a good place to start, and maybe even good enough.

We could also hook it into the daemon's code itself, on handling requests. this gets a bit trickier. or maybe the repo's? the point is that there may be other things that increase disk usage size that are not regular commands issued by the user. (i.e. dht puts, bitswap agents, perhaps periodic tasks (eg downloading new bootstrap list or blocklist), we dont do any of this atm but we will in the future)

@chriscool know what git does here?


The middle ground (for server?) is to use some stats / bandwidth_max to adjust GCTimeout and StorageGCWatermark during runtime in a "control system" way.

I'm +1 on having a timeout (as it allows caching a bunch of content for some time and periodically deleting), but it isn't enough to protect the user from filling up the space with unpinned things. We'll still need the check on changes to the repo.

But coverage is more important than runtime cost (each size check is cheap (relative to?)) here.

Yes.


the model is interesting. something like it might work here. but not sure if it will be simpler to just GCIfAboveWatermark() after every repo add. (caveat being that add performance may decrease significantly if pin > watermark. maybe we can cache that size and check for it. actually, cant we measure the size of pinned and unpinned things, and change the measurements as we add and rm data? (merely as a way to check if a GC should be tried, or would be useful. we should still go back to measure often, as our estimate may be off or not account external tampering with the repo (e.g. user manually removing files... which is wrong, but might happen))

remind me to discuss bitswap agents with you later on, will want some interesting models there :)

jbenet commented 8 years ago

@rht sorry, had missed some comments, responded now. (mostly inconsequential to this PR, but explanation)

jbenet commented 8 years ago

@rht sorry, what's the next step on this? not sure whether it's on me or you.

rht commented 8 years ago

The blocking is on me. Sorry wait this will happen soon.

jbenet commented 8 years ago

@rht ah no rush. just wanted to make sure i wasn't stalling. we can slate this for 0.3.7 or 0.3.8 as needed.

rht commented 8 years ago

More cases can be handled later. Fixed period and conditional gc are ready for 0.3.6, however (but missed the train...).

[ ] gc is triggered at watermark

Direct check means if the gc begin eventlog can be checked. But for now checking the blocks size with du works.

rht commented 8 years ago

maybe we can cache that size and check for it. actually, cant we measure the size of pinned and unpinned things, and change the measurements as we add and rm data? (merely as a way to check if a GC should be tried, or would be useful. we should still go back to measure often, as our estimate may be off or not account external tampering with the repo (e.g. user manually removing files... which is wrong, but might happen))

Cache the object sizes? What about including them in the header?

rht commented 8 years ago

clarity: GCTimeout should have been GCPeriod.

jbenet commented 8 years ago

@rht very excited to have this in. sorry for even more changes requested. this is a big/important feature that can go wrong, so want to makse sure it's super clear to other people, and that it works very reliably (hence more tests)

jbenet commented 8 years ago

@whyrusleeping i think this needs another look from you before we pull it in. given dev0.4.0 changes, this could complicate your life if we dont merge the interface nicely

@rht looks like travis didnt like the tests

whyrusleeping commented 8 years ago

@jbenet this doesnt change any of the actual GC behaviour, so it should be fine with my changes in 040, that said, I'll most certainly take a hack at rebasing 040 on this to see if there are any glaring issues.

jbenet commented 8 years ago

@rht thanks! looking much better.

and that's it!

rht commented 8 years ago

Still debugging the issue with the ipfs add not calculating the size properly.

What do you think of this err handling grouping?

    setErr := func(err error) bool {
        if err != nil {
            res.SetError(err, cmds.ErrNormal)
            return true
        }
        return false
    }

    // init IPFS node config - if --init is provided and if there is no config file
    if err := maybeInitNode(req); setErr(err) {
        return
    }

    cfg, err := req.InvocContext().GetConfig()
    if setErr(err) {
        return
    }

    // acquire the repo lock _before_ constructing a node. we need to make
    // sure we are permitted to access the resources (datastore, etc.)
    repo, err := fsrepo.Open(req.InvocContext().ConfigRoot)
    if setErr(err) {
        return
    }

Additionally, there could be 1 global err variable, where each of the if setErr ... can be hidden into each maybe func.

rht commented 8 years ago

e.g.

    // construct api endpoint - every time
    err, apiErrc := serveHTTPApi(req, node, cfg)
    if setErr(err) {
        return
    }

    // construct http gateway - if it is set in the config
    err, gwErrc := maybeServeHTTPGateway(req, node, cfg)
    if setErr(err) {
        return
    }

    // construct fuse mountpoints - if the user provided the --mount flag
    err, mountErrc := maybeMountFuse(req, node, cfg)
    if setErr(err) {
        return
    }

becomes

    var daemonErr error
    // construct api endpoint - every time
    apiErrc := serveHTTPApi(req, node, cfg, daemonErr)

    // construct http gateway - if it is set in the config
    gwErrc := maybeServeHTTPGateway(req, node, cfg, daemonErr)

    // construct fuse mountpoints - if the user provided the --mount flag
    mountErrc := maybeMountFuse(req, node, cfg, daemonErr)

    if setErr(daemonErr) {
        return
    }
rht commented 8 years ago

It's rather more like

    var daemonErr error // could have been put in a local daemon struct, but there could be issue with reusability if the functions are too tightly coupled to the struct
    var apiErrc, gwErrc, mountErrc, gcErrc <-chan error
    // construct api endpoint - every time
    daemonErr, apiErrc = serveHTTPApi(req, node, cfg, daemonErr)

    // construct http gateway - if it is set in the config
    daemonErr, gwErrc = maybeServeHTTPGateway(req, node, cfg, daemonErr)

    // construct fuse mountpoints - if the user provided the --mount flag
    daemonErr, mountErrc = maybeMountFuse(req, node, cfg, daemonErr)

    // repo blockstore GC - if --disable-gc flag is not present
    daemonErr, gcErrc = maybeRunGC(req, node, daemonErr)

    if setErr(daemonErr) {
        return
    }

err is carried both in input and output (like walk), like walkfn in filepath.walk, or context.Context, or named return values.

jbenet commented 8 years ago

I'm more ok with this construction:

    setErr := func(err error) bool {
        if err != nil {
            res.SetError(err, cmds.ErrNormal)
            return true
        }
        return false
    }

    // init IPFS node config - if --init is provided and if there is no config file
    if err := maybeInitNode(req); setErr(err) {
        return
    }

@whyrusleeping ?

whyrusleeping commented 8 years ago

@jbenet agreed. thats rather clean.

rht commented 8 years ago

The periodic gc is at least useful if merged, so conditional gc in ipfs add is removed from this PR for now (due to req.Values not being passed properly to the daemon that it keeps failing to know the file size, if @whyrusleeping knows why https://github.com/rht/go-ipfs/commit/ce29999a3540a369a691267f03ce475b9f9350fb#diff-bd0ca9f1e2d5c7ba128af01ec4c1131cR94). gc params have been swept under the GC struct that it is simpler now.

Remaining thing is to fix the disk_usage test in os x.

chriscool commented 8 years ago

@jbenet what Git does is explained in the git gc doc:

https://www.kernel.org/pub/software/scm/git/docs/git-gc.html

It's a bit complex, but it depends mostly on some time limits or on some number of pack or loose object files.

rht commented 8 years ago

I wonder if git gc --auto check can be generalized to large files, which ipfs must handle. On the other hand there is also weekly "gc" fstrim for ssd (http://man7.org/linux/man-pages/man8/fstrim.8.html).

chriscool commented 8 years ago

Yeah, the git gc --auto check could probably be generalized to large files or rather large blobs. Though it is probably less efficient to check blob size than just a number of loose or pack files, or the time since the previous gc.

About ipfs, I would say that it could be configured to use some amount of space, and gc could be started when adding stuff if the amount of space used reach for example 80% of the configured amount.

rht commented 8 years ago

To optimize for perf, total repo size can be computed once, and subsequent update to this value is calculated from the size of blocks being added/removed.

About ipfs, I would say that it could be configured to use some amount of space, and gc could be started when adding stuff if the amount of space used reach for example 80% of the configured amount.

Do you mean 1. periodic gc or 2. conditional gc for each operation?

I wonder if it is too much (or not enough because git lacks built-in 'signed-off-by') to put a full git interface into the merkledag's versioning DSL https://github.com/ipfs/notes/issues/45? And how are loose objects being defined here?

jbenet commented 8 years ago

thanks for links @chriscool ! :+1:

About ipfs, I would say that it could be configured to use some amount of space, and gc could be started when adding stuff if the amount of space used reach for example 80% of the configured amount.

agreed! @rht's PR already does this with a watermark.

@rht i'm happy to merge incrementally, as you prefer. ideally we could merge the whole thing but happy to split.

@whyrusleeping can you help in here? would be nice to land this. ... though, is this going to screw up dev0.4.0 rebasing?

whyrusleeping commented 8 years ago

update here? cc @rht @jbenet

rht commented 8 years ago

The only issue here is to make disk usage test work on os x.

rht commented 8 years ago

This should be sufficient for os x case https://github.com/ipfs/go-ipfs/pull/1495/files#diff-5fc0c52c704b9761edbae1b09e9969bcR46. RFCR -> RFM

rht commented 8 years ago

(ok so using test_cmp is wrong because it uses the previous expected, actual files. If someone with an os x can debug this...)

rht commented 8 years ago

This should fix it, 60% of 2MB is ~1393KB.

rht commented 8 years ago

Recent err is statistical err.

whyrusleeping commented 8 years ago

huh, while i'm sure that the error in that test has nothing to do with your PR. its not one i'm used to seeing. odd...

whyrusleeping commented 8 years ago

also, I would like to make sure that this is not enabled by default when we merge it. I dont want to surprise users with their data randomly disappearing (even if its not pinned).

rht commented 8 years ago

@whyrusleeping reversed

whyrusleeping commented 8 years ago

Cool, this LGTM then.

jbenet commented 8 years ago

@whyrusleeping called today for not merging anything more into master (except for utp and ipfs-update), and merging everything new on top of 0.4.0.

this is probably the only thing that would be a lot of work to both merge before, AND to rebase on top of 0.4.0. idk what's best here. can you two discuss and come up with some resolution? i'd be good to merge this soon, but i also want to respect the blocker and only merge on top of 0.4.0.

rht commented 8 years ago

I think the only part of the PR that will cause merge conflict is the daemonFunc refactor & err channels. Else should be fine. I can redo this refactor on top of 0.4.0 if that's preferred.

rht commented 8 years ago

Should be safe to merge now. For later ref, this is the commit that contains the daemonFunc refactor + fuse errc https://github.com/rht/go-ipfs/commit/0689e3a6e548da61a7b44c0e3cbea59c1bc9772c