Rewrite pinning to store pins in IPFS objects

tv42 commented 9 years ago

Still needs a migration. Notes for that:

There's two parts:

https://github.com/ipfs/go-ipfs/commit/1009aeb54625687a668f4b0201fd60f03431add3 The keys for that looked like /local/pins/{direct,recursive,indirect}/keys/* and can be just removed.
https://github.com/ipfs/go-ipfs/commit/1673b9cc3c3aa697ff970fc90581b610028da915 The keys for that look like /local/pins/{direct,recursive,indirect}/keys (parents of the above), contain json, and can be fed to the new pinning mechanism to generate the new state. The direct and recursive json are arrays, the indirect json is a dict {key: refcount}.

tv42 commented 9 years ago

Travis failures look like OOM and timeouts.

@chriscool When you say "all", that makes me think "out of scope for this issue" ;)

@whyrusleeping @jbenet Let me know if there's anything unclear here, or something I can help with. I'm basically waiting to hear back from you two.

whyrusleeping commented 9 years ago

A few things that are confusing to me, but other than that, this looks really nice. Removes a bunch of things i wanted removed, and makes bunch of change towards a more sensible pinning setup

chriscool commented 9 years ago

@tv42 what I wanted to say is that as far as I know the other file comparison functions, like test_cmp, test_sort_cmp, are supposed to take the expected file first and the actual file second, and it would be nice if the comparison functions you introduce were following the same pattern too.

tv42 commented 9 years ago

@chriscool Hmm. Sharness's test_cmp is documented to take arguments in the order expected, actual. The preexisting code was test_sort_cmp gc_out_actual2 gc_out_exp2, and I just kept that order. You're right, this is probably worth fixing.

chriscool commented 9 years ago

@tv42 great! Thanks for fixing it!

jbenet commented 9 years ago

@tv42 minor comments above o/ -- aside, this LGTM.

vitzli commented 9 years ago

Debian Stretch inside virtualbox, 2 GB RAM, 2 CPU cores allocated for VM (Core 2 Quad 9550).

HW config 1

VM is located on HDD. Guest root partition and guest /srv partition are located on separate physical HDDs (both SATA2). Test files located at /srv/test1, IPFS_PATH points to /srv/ipfs_conf.

user@deb-stretch-ipfs:/srv/test1$ time sha256sum random.dat
2dc9095156bf916fc1cd07d98bf9e0d330cfbefedf9f037a65e378cbc9e70ab1  random.dat

real    0m19.488s
user    0m13.504s
sys 0m5.732s

sha256sum speed: 102.6 MB/s

Adding urandom junk to ipfs:

user@deb-stretch-ipfs:/srv/test1$ time ipfs add random.dat 
added QmfQDSXnxvjVPDzhwmHK83CWFaCUwyYLTnjEf5dZm1wJqb random.dat

real    3m39.620s
user    1m10.676s
sys 0m43.760s

Adding file full of 0xFF:

user@deb-stretch-ipfs:/srv/test1$ time ipfs add notsorandom.dat 
added QmNmKeTsXeeLHA8TYeZg6DyWx96hhfSiqWBsoSM7wec3ck notsorandom.dat                                     

real    0m26.844s
user    0m22.704s
sys 0m11.288s

Average speed: 74.6 MB/s, iotop log

vitzli commented 9 years ago

HW config 2 (host OS)

Core 2 Q9550, 4 CPU cores, 8 GB RAM, test file and ipfs_path on the same SSD. Adding random.dat:

added QmfQDSXnxvjVPDzhwmHK83CWFaCUwyYLTnjEf5dZm1wJqb random.dat

real    5m11.659s
user    0m53.792s
sys 0m8.120s

Progress bar finishes at 10:55:15, process ends around 10:58, iotop log, average speed: 6.43 MB/s

HW config 3 (host OS)

IPFS_PATH on (fragmented?) HDD, test file on SSD: Doing ipfs add gave estimated time around 10 minutes for the same 2GB file, I assume that it is caused by the fragmentation of HDD partition (20 GB left out of 300GB). Also, there were very slow read/write speeds in iotop:

11:04:17  1895 be/4 alex        7.98 K/s 3112.68 K/s  0.00 %  1.50 % ipfs add random.dat
11:04:17  1895 be/4 alex        7.98 K/s 3113.39 K/s  0.00 %  1.49 % ipfs add random.dat
11:04:18  1895 be/4 alex        0.00 K/s 3631.49 K/s  0.00 %  1.70 % ipfs add random.dat

I would discard this test.

whyrusleeping commented 9 years ago

@vitzli nice! so this appears to be a little bit faster than master, right?

vitzli commented 9 years ago

@whyrusleeping I'm sure now that I've made a mistake, I commented on it in #1216, results are somewhat disappointing, pin branch seems to be slower than master, but I need somebody to verify my results. 1 core, random.dat file, pin branch, test file and ipfs dir on the same separate HDD:

TEST 1  3m13.276s
TEST 2  3m12.529s
TEST 3  3m24.626s
TEST 4  3m31.363s
TEST 5  3m44.372s
TEST 6  3m49.319s
TEST 7  3m43.692s
TEST 8  4m3.791s
TEST 9  3m42.013s
TEST 10 3m41.873s

Average time: 3m36.6s, 9.2 MB/s 2 cores:

TEST 1  3m39.991s
TEST 2  3m47.141s
TEST 3  3m50.335s
TEST 4  3m56.028s
TEST 5  4m2.825s
TEST 6  4m5.213s
TEST 7  3m57.563s
TEST 8  4m5.529s
TEST 9  4m0.147s
TEST 10 4m2.940s

Average time 3m56.7s, 8.5 MB/s

whyrusleeping commented 9 years ago

@vitzli hrm.. thats what @tv42 said might be the case. a little disappointing... but I'm sure there is plenty of room for optimization.

Could you maybe put together a script to run these benchmarks? would be great to have a 'ipfs-add-benchmark' thing around.

tv42 commented 9 years ago

Round 1 of the pinning rework was explicitly done so it doesn't change the APIs much. Any other way would have been too much chaos in the codebase.

whyrusleeping commented 9 years ago

So further revisions wont require any sort of a migration, correct?

tv42 commented 9 years ago

It's always possible we discover the format was bad. I'd be surprised if that happened soon.

tv42 commented 9 years ago

Uhh, I added more tests and may or may not have found a bug. Please don't merge while I diagnose. Preparing migration code should still be safe.

whyrusleeping commented 9 years ago

It's always possible we discover the format was bad. I'd be surprised if that happened soon.

Thats all i can possibly ask for :)

may or may not have found a bug

here, take this:

tv42 commented 9 years ago

Okay that turned out to be a red herring (bug in the test), but did expose a mishandling of marshaled refcounts that was hidden by the fact that they were uint8.

whyrusleeping commented 9 years ago

one final request, can you bump the repo version to 3?

whyrusleeping commented 8 years ago

Alright, the pinning migration is complete, pending @jbenet's blessing (and CR). once that lands, we can merge this and move onto the much anticipated s3 datastore :smile:

whyrusleeping commented 8 years ago

huh, this PR makes unixfs/mod tests eat all my ram

jbenet commented 8 years ago

Sorry for the horrendously slow CR. S3 next. Okay, this LGTM!! :+1: :+1:

[ ] Need to rebase this on new master.
[ ] did we benchmark perf here vs old one?
[ ] would love to see a graphmd output of the pin root
[ ] should we do trivial mark-and-sweep as part of this? or future pr? (may improve perf here)

jbenet commented 8 years ago

also, @whyrusleeping are you done with CR here?

whyrusleeping commented 8 years ago

i'm done with CR, but the tests for unixfs/mod fail. I'm not sure what happens...

whyrusleeping commented 8 years ago

@tv42 there appears to be a bug somewhere, the unixfs/mod tests fail because a call to Flush hangs and eats a ton of memory.

whyrusleeping commented 8 years ago

hangs here: https://github.com/ipfs/go-ipfs/blob/master/unixfs/mod/dagmodifier.go#L215

tv42 commented 8 years ago

@whyrusleeping I've seen the memory eating symptoms. I have no idea why that happens, or whether to blame unixfs or pinning, at this time.

whyrusleeping commented 8 years ago

@tv42 i'm >90% certain that its pinning. Memory usage is fine until it hits the pin.Flush call, and the pin.Flush call never returns, building up memory used until it crashes. I can look into it more tomorrrow

whyrusleeping commented 8 years ago

this loop never exits: https://github.com/ipfs/go-ipfs/blob/pin/pin/set.go#L141

whyrusleeping commented 8 years ago

it appears that somewhere the dagmodifier code is calling 'remove indirect pin' on something that isnt indirectly pinned. causing an underflow on the uint64, resulting in an infinite (compared to the length of our lifetime) loop. The pinner should probably either error, or just set refcnt to zero in such a case.

tv42 commented 8 years ago

Oh, wow. Thanks for digging that out. That's easy to guard against (in the remove logic, not in this part that writes out the in-memory state), but that doesn't remove the fact that the calling logic is flawed..

whyrusleeping commented 8 years ago

yeah, the calling logic is flawed. I'm thinking about how that should work... it would be much easier with just pinning roots recursively and using mark and sweep.

tv42 commented 8 years ago

Sure, but implementing a GC (well) is a lot more effort, I'd hate to bundle too many changes into one.

I'll add in the underflow avoidance (if you haven't already), I should have time for that in a few hours. That'll get us limping onward, though it means the pinning state is being corrupted.

jbenet commented 8 years ago

calling logic is flawed

Indeed.

The underlying libs shouldn't fail like this though-- it's easy to validate expected invariants and error out if someone violated the calls.

not in this part that writes out the in-memory state

i think it should validate when writing out too. i dont think a check here will have significant perf degradation, and will be quite a bit safer, particularly if people other than @tv42 and @whyrusleeping start poking into this code.

tv42 commented 8 years ago

i think it should validate when writing out too.

Validate what? How high of a refcount is invalid?

The action time validate makes sense; it can prevent underflow. After the action, it's just key->number.

jbenet commented 8 years ago

Hey @tv42 o/ !

Decided we'll pull this into a dev branch of ipfs for now. I want to minimize migrations, so we can put all these pinning repo changes, including the making of the last ipfs objects (as discussed here https://github.com/ipfs/go-ipfs/pull/1225#discussion_r31385721 ) -- into just one migration.

@whyrusleeping @tv42 how do we want to handle the remaining pieces:

[x] Need to rebase this to a new dev branch
[x] did we benchmark perf here vs old one?
[x] one root pins object
[ ] do we want to move to mark-and-sweep before dev merges into master? (may significantly improve both perf and safety here. (i dont think current recursive pinning is safe to crashes)).

whyrusleeping commented 8 years ago

@jbenet perf here is ever so slightly better than the existing code.

in regards to mark and sweep, i think that would probably be good... but thats quite the large change.

tv42 commented 8 years ago

On Sun, Jun 07, 2015 at 01:57:34PM -0700, Juan Batiz-Benet wrote:

[ ] Need to rebase this to a new dev branch

[x] did we benchmark perf here vs old one?

[x] one root pins object

[ ] do we want to move to mark-and-sweep before dev merges into master? (may significantly improve both perf and safety here. (i dont think current recursive pinning is safe to crashes)).

I'll rebase it.

My benchmarks of the old and this version were very close to each other, this one just slightly better. Anything more than that means pinning API changes (move away from the in-memory copy + complete write-out model), and I wasn't willing to risk trying to land that many changes at once. And if someone implements real garbage collection, it'll change the requirements for pinning, so it seemed unwise to start that work.

One root pins object: already there, datastore key /local/pins.

:(){ :|:&};:

tv42 commented 8 years ago

[x] Need to rebase this to a new dev branch

jbenet commented 8 years ago

@tv42 ok great!

Rerunning tests-- all go test runs failed :/

jbenet commented 8 years ago

I've seen this happen 6/8 times: https://travis-ci.org/ipfs/go-ipfs/jobs/65928853#L187

tv42 commented 8 years ago

I saw it too, now. Looking..

tv42 commented 8 years ago

@jbenet @whyrusleeping That's a dagmodifier error branch that explicitly unpins an object.

        // We found the correct child to write into
        if cur+bs > offset {
            // Unpin block
            ckey := key.Key(node.Links[i].Hash)
            dm.mp.RemovePinWithMode(ckey, pin.Indirect)

            child, err := node.Links[i].GetNode(dm.ctx, dm.dagserv)
            if err != nil {
// THIS TRIGGERS
                return "", false, err
            }
            k, sdone, err := dm.modifyDag(child, offset-cur, data)
            if err != nil {
                return "", false, err
            }

//NEVER GETS HERE
            // pin the new node
            dm.mp.PinWithMode(k, pin.Indirect)

tv42 commented 8 years ago

Well, the error at that point is a missing block too -- but that's the cause of the other missing block, that's what the test really craps out on. Lovely.

tv42 commented 8 years ago

Found it.

tv42 commented 8 years ago

Should be good again. @whyrusleeping FYI the dagmodifier changes.

whyrusleeping commented 8 years ago

@tv42 good catch, thanks!

whyrusleeping commented 8 years ago

merged into dev0.4.0

tv42 commented 8 years ago

Can this be closed now?

jbenet commented 8 years ago

I think so yep :) :+1: thanks @tv42

ipfs / kubo