Closed tv42 closed 8 years ago
Travis failures look like OOM and timeouts.
@chriscool When you say "all", that makes me think "out of scope for this issue" ;)
@whyrusleeping @jbenet Let me know if there's anything unclear here, or something I can help with. I'm basically waiting to hear back from you two.
A few things that are confusing to me, but other than that, this looks really nice. Removes a bunch of things i wanted removed, and makes bunch of change towards a more sensible pinning setup
@tv42 what I wanted to say is that as far as I know the other file comparison functions, like test_cmp, test_sort_cmp, are supposed to take the expected file first and the actual file second, and it would be nice if the comparison functions you introduce were following the same pattern too.
@chriscool Hmm. Sharness's test_cmp is documented to take arguments in the order expected, actual. The preexisting code was test_sort_cmp gc_out_actual2 gc_out_exp2
, and I just kept that order. You're right, this is probably worth fixing.
@tv42 great! Thanks for fixing it!
@tv42 minor comments above o/ -- aside, this LGTM.
Debian Stretch inside virtualbox, 2 GB RAM, 2 CPU cores allocated for VM (Core 2 Quad 9550).
VM is located on HDD. Guest root partition and guest /srv partition are located on separate physical HDDs (both SATA2). Test files located at /srv/test1, IPFS_PATH points to /srv/ipfs_conf.
user@deb-stretch-ipfs:/srv/test1$ time sha256sum random.dat
2dc9095156bf916fc1cd07d98bf9e0d330cfbefedf9f037a65e378cbc9e70ab1 random.dat
real 0m19.488s
user 0m13.504s
sys 0m5.732s
sha256sum speed: 102.6 MB/s
Adding urandom junk to ipfs:
user@deb-stretch-ipfs:/srv/test1$ time ipfs add random.dat
added QmfQDSXnxvjVPDzhwmHK83CWFaCUwyYLTnjEf5dZm1wJqb random.dat
real 3m39.620s
user 1m10.676s
sys 0m43.760s
Around 03:27:39 progress bar finishes, but ipfs does something, no output to console window. Process completes around 03:29:13. Average speed: 9.1 MB/s, iotop log
Adding the same file again requires only 28.3 s. Average speed: 70 MB/s. iotop log
iotop columns: TID | PRIO | USER | DISK READ | DISK WRITE | SWAPIN | IO> | COMMAND
Adding file full of 0xFF:
user@deb-stretch-ipfs:/srv/test1$ time ipfs add notsorandom.dat
added QmNmKeTsXeeLHA8TYeZg6DyWx96hhfSiqWBsoSM7wec3ck notsorandom.dat
real 0m26.844s
user 0m22.704s
sys 0m11.288s
Average speed: 74.6 MB/s, iotop log
Core 2 Q9550, 4 CPU cores, 8 GB RAM, test file and ipfs_path on the same SSD. Adding random.dat:
added QmfQDSXnxvjVPDzhwmHK83CWFaCUwyYLTnjEf5dZm1wJqb random.dat
real 5m11.659s
user 0m53.792s
sys 0m8.120s
Progress bar finishes at 10:55:15, process ends around 10:58, iotop log, average speed: 6.43 MB/s
IPFS_PATH on (fragmented?) HDD, test file on SSD:
Doing ipfs add
gave estimated time around 10 minutes for the same 2GB file, I assume that it is caused by the fragmentation of HDD partition (20 GB left out of 300GB). Also, there were very slow read/write speeds in iotop:
11:04:17 1895 be/4 alex 7.98 K/s 3112.68 K/s 0.00 % 1.50 % ipfs add random.dat
11:04:17 1895 be/4 alex 7.98 K/s 3113.39 K/s 0.00 % 1.49 % ipfs add random.dat
11:04:18 1895 be/4 alex 0.00 K/s 3631.49 K/s 0.00 % 1.70 % ipfs add random.dat
I would discard this test.
@vitzli nice! so this appears to be a little bit faster than master, right?
@whyrusleeping I'm sure now that I've made a mistake, I commented on it in #1216, results are somewhat disappointing, pin branch seems to be slower than master, but I need somebody to verify my results. 1 core, random.dat file, pin branch, test file and ipfs dir on the same separate HDD:
TEST 1 3m13.276s
TEST 2 3m12.529s
TEST 3 3m24.626s
TEST 4 3m31.363s
TEST 5 3m44.372s
TEST 6 3m49.319s
TEST 7 3m43.692s
TEST 8 4m3.791s
TEST 9 3m42.013s
TEST 10 3m41.873s
Average time: 3m36.6s, 9.2 MB/s 2 cores:
TEST 1 3m39.991s
TEST 2 3m47.141s
TEST 3 3m50.335s
TEST 4 3m56.028s
TEST 5 4m2.825s
TEST 6 4m5.213s
TEST 7 3m57.563s
TEST 8 4m5.529s
TEST 9 4m0.147s
TEST 10 4m2.940s
Average time 3m56.7s, 8.5 MB/s
@vitzli hrm.. thats what @tv42 said might be the case. a little disappointing... but I'm sure there is plenty of room for optimization.
Could you maybe put together a script to run these benchmarks? would be great to have a 'ipfs-add-benchmark' thing around.
Round 1 of the pinning rework was explicitly done so it doesn't change the APIs much. Any other way would have been too much chaos in the codebase.
So further revisions wont require any sort of a migration, correct?
It's always possible we discover the format was bad. I'd be surprised if that happened soon.
Uhh, I added more tests and may or may not have found a bug. Please don't merge while I diagnose. Preparing migration code should still be safe.
It's always possible we discover the format was bad. I'd be surprised if that happened soon.
Thats all i can possibly ask for :)
may or may not have found a bug
here, take this:
Okay that turned out to be a red herring (bug in the test), but did expose a mishandling of marshaled refcounts that was hidden by the fact that they were uint8.
one final request, can you bump the repo version to 3?
Alright, the pinning migration is complete, pending @jbenet's blessing (and CR). once that lands, we can merge this and move onto the much anticipated s3 datastore :smile:
huh, this PR makes unixfs/mod tests eat all my ram
Sorry for the horrendously slow CR. S3 next. Okay, this LGTM!! :+1: :+1:
also, @whyrusleeping are you done with CR here?
i'm done with CR, but the tests for unixfs/mod fail. I'm not sure what happens...
@tv42 there appears to be a bug somewhere, the unixfs/mod tests fail because a call to Flush
hangs and eats a ton of memory.
@whyrusleeping I've seen the memory eating symptoms. I have no idea why that happens, or whether to blame unixfs or pinning, at this time.
@tv42 i'm >90% certain that its pinning. Memory usage is fine until it hits the pin.Flush call, and the pin.Flush call never returns, building up memory used until it crashes. I can look into it more tomorrrow
this loop never exits: https://github.com/ipfs/go-ipfs/blob/pin/pin/set.go#L141
it appears that somewhere the dagmodifier code is calling 'remove indirect pin' on something that isnt indirectly pinned. causing an underflow on the uint64, resulting in an infinite (compared to the length of our lifetime) loop. The pinner should probably either error, or just set refcnt to zero in such a case.
Oh, wow. Thanks for digging that out. That's easy to guard against (in the remove logic, not in this part that writes out the in-memory state), but that doesn't remove the fact that the calling logic is flawed..
yeah, the calling logic is flawed. I'm thinking about how that should work... it would be much easier with just pinning roots recursively and using mark and sweep.
Sure, but implementing a GC (well) is a lot more effort, I'd hate to bundle too many changes into one.
I'll add in the underflow avoidance (if you haven't already), I should have time for that in a few hours. That'll get us limping onward, though it means the pinning state is being corrupted.
calling logic is flawed
Indeed.
The underlying libs shouldn't fail like this though-- it's easy to validate expected invariants and error out if someone violated the calls.
not in this part that writes out the in-memory state
i think it should validate when writing out too. i dont think a check here will have significant perf degradation, and will be quite a bit safer, particularly if people other than @tv42 and @whyrusleeping start poking into this code.
i think it should validate when writing out too.
Validate what? How high of a refcount is invalid?
The action time validate makes sense; it can prevent underflow. After the action, it's just key->number.
Hey @tv42 o/ !
Decided we'll pull this into a dev
branch of ipfs for now. I want to minimize migrations, so we can put all these pinning repo changes, including the making of the last ipfs objects (as discussed here https://github.com/ipfs/go-ipfs/pull/1225#discussion_r31385721 ) -- into just one migration.
@whyrusleeping @tv42 how do we want to handle the remaining pieces:
dev
branch dev
merges into master? (may significantly improve both perf and safety here. (i dont think current recursive pinning is safe to crashes)).@jbenet perf here is ever so slightly better than the existing code.
in regards to mark and sweep, i think that would probably be good... but thats quite the large change.
On Sun, Jun 07, 2015 at 01:57:34PM -0700, Juan Batiz-Benet wrote:
- [ ] Need to rebase this to a new
dev
branch- [x] did we benchmark perf here vs old one?
- [x] one root pins object
- [ ] do we want to move to mark-and-sweep before
dev
merges into master? (may significantly improve both perf and safety here. (i dont think current recursive pinning is safe to crashes)).
I'll rebase it.
My benchmarks of the old and this version were very close to each other, this one just slightly better. Anything more than that means pinning API changes (move away from the in-memory copy + complete write-out model), and I wasn't willing to risk trying to land that many changes at once. And if someone implements real garbage collection, it'll change the requirements for pinning, so it seemed unwise to start that work.
One root pins object: already there, datastore key /local/pins
.
:(){ :|:&};:
dev
branch@tv42 ok great!
Rerunning tests-- all go test runs failed :/
I've seen this happen 6/8 times: https://travis-ci.org/ipfs/go-ipfs/jobs/65928853#L187
I saw it too, now. Looking..
@jbenet @whyrusleeping That's a dagmodifier error branch that explicitly unpins an object.
// We found the correct child to write into
if cur+bs > offset {
// Unpin block
ckey := key.Key(node.Links[i].Hash)
dm.mp.RemovePinWithMode(ckey, pin.Indirect)
child, err := node.Links[i].GetNode(dm.ctx, dm.dagserv)
if err != nil {
// THIS TRIGGERS
return "", false, err
}
k, sdone, err := dm.modifyDag(child, offset-cur, data)
if err != nil {
return "", false, err
}
//NEVER GETS HERE
// pin the new node
dm.mp.PinWithMode(k, pin.Indirect)
Well, the error at that point is a missing block too -- but that's the cause of the other missing block, that's what the test really craps out on. Lovely.
Found it.
Should be good again. @whyrusleeping FYI the dagmodifier changes.
@tv42 good catch, thanks!
merged into dev0.4.0
Can this be closed now?
I think so yep :) :+1: thanks @tv42
Still needs a migration. Notes for that:
There's two parts: