Open Kubuxu opened 8 years ago
The solution i'm investigating here is batching together outgoing provide operations, combining findpeer queries and outgoing 'puts'. The issue is that it might require making some backwards incompatible changes to the dht protocol, which isnt really a huge problem since we have multistream
@magik6k Thanks! I will keep you posted as we make improvements to this.
Similar report, while adding ~70,000 objects (~100k ea) we were maxing out our instance's traffic (70MBit in, 160MBit out) for over 14 hours (!!!), at which point someone killed the IPFS daemon.
Back of the envelope, this is at least 1TB outgoing for ~10GB files.
One particular problem: because the daemon was killed before the process completed, it seems that almost none of the files were pinned. Running ipfs repo gc
deleted almost everything:
; before gc
NumObjects 2995522
RepoSize 20
RepoPath /home/ubuntu/.ipfs
Version fs-repo@4
; after gc
NumObjects 771
RepoSize 20
RepoPath /home/ubuntu/.ipfs
Version fs-repo@4
(side note, the chunk counts seem way too high too, almost 40x per image instead of expected 4x)
@parkan which version of ipfs are you using? This is definitely a known issue, but it has been improved slightly since 0.4.2
@whyrusleeping 0.4.3-rc3
@whyrusleeping my intuition is that there's some kind of context that's not being released before a queue is drained, which is supported by none of the files being pinned successfully after killing the daemon. Does this seem plausible?
@parkan are you killing the daemon before the add call is complete?
@whyrusleeping no, these are all individual add calls that complete reasonably quickly, load/network traffic happens w/o client interaction
One particular problem: because the daemon was killed before the process completed, it seems that almost none of the files were pinned. Running ipfs repo gc deleted almost everything:
That's a very serious bug to figure out. consistency is paramout here and that's much more important to get right.
Similar report, while adding ~70,000 objects (~100k ea) we were maxing out our instance's traffic (70MBit in, 160MBit out) for over 14 hours (!!!), at which point someone killed the IPFS daemon.
Back of the envelope, this is at least 1TB outgoing for ~10GB files.
This is caused by providing random access to all content, as discussed via email. for the large use case, we should look at recompiling go-ipfs without creating a provider record for each object, and leverage direct connections. (FWIW, orbit on js-ipfs works without the DHT entirely, and i imagine your use case here could do very well if you make sure to connect the relevant nodes together so there's no discovery to happen. This is the way to do it before pubsub connects them for you).
You can check which objects are pinned after an add with: ipfs pin ls --type=recursive
.
It also seems really weird that your repo size is listed as 20
for both of those calls...
@parkan also, what is the process for adding files? is it just a single call to ipfs add
? or many smaller calls broken up? Are you passing any other flags?
@parkan on chunk size, i'd be interested (separately) how well Rabin Fingerprint based chunking (ipfs add -s <file>
) does for your images. If they're similar it should reduce the storage, but this is experimental and not proved to help. it's also slower on add, as it's got to do math when chunking.
@whyrusleeping ipfs pin ls was showing <500 objects pinned successfully, I don't think any of them are from this add attempt
also, small correction to my previous statement: not all ipfs add
calls had completed, basically it was like this:
1) we add ~10k files with relatively fast return from ipfs add 2) adds slow down, taking >30s per file 3) daemon is killed
@whyrusleeping these are many separate adds, we basically write a temporary file (incidentally, being able to pin from stdin would be great), ipfs add
it (no options) and wait for return
@parkan Ah, yeah. The pins arent written until the add is complete. An ipfs add call will only pin the root hash of the result of any given call to ipfs add
.
being able to pin from stdin would be great
➜ ~ echo QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n | ipfs pin add
pinned QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n recursively
When the adds slow down that far could you get me some of the performance debugging info from https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md ?
@whyrusleeping sorry, maybe I was being unclear: for 10k+ files the "add" part was completed (i.e. the call to ipfs add
returned) but pin did not land.
re: stdin, I meant ipfs add
, not ipfs pin add
:) Can I just pipe binary data into ipfs add -
?
So, you did:
ipfs add -r <dir with 100k files in it>
...
...
...
added <some hash> <that dirs name>
Right?
Does that final hash from that add call show up in the output of ipfs pin ls --type=recursive
?
I'm also confused about what is meant by 'adds slow down' if the add call completed by that point. Was this a subsequent call to ipfs add
with more files?
re: stdin: you can indeed just pipe binary data to ipfs add :)
➜ ~ echo "hello" | ipfs add
added QmZULkCELmmk5XNfCgTnCyFgAVxBRBXyDHGGMVoLFLiXEN QmZULkCELmmk5XNfCgTnCyFgAVxBRBXyDHGGMVoLFLiXEN
➜ ~ dd if=/dev/urandom bs=4M count=2 | ipfs add
7.75 MB / ? [----------=---------------------------------------------------------] 2+0 records in
2+0 records out
8388608 bytes (8.4 MB, 8.0 MiB) copied, 0.571907 s, 14.7 MB/s
added QmbPmPSxWcQqJGcq7hHJ7CGMxCQ2zxpMpNHLxARePPNv6n QmbPmPSxWcQqJGcq7hHJ7CGMxCQ2zxpMpNHLxARePPNv6n
here's the parsed dump:
@whyrusleeping no, we are calling ipfs add
on each individual file because there's no actual tree to add from
so, again:
The slowdown happens about 10k images into the loop
Thanks for the stack dump, that just confirms for me that the problems are caused by the overproviding. The 2000 gorotuines stuck in runtime_Semacquire are somewhat odd though, could I get the full stack dump to investigate that one further?
As for the adds, after each ipfs add
call completes (or some subset of them complete) are those hashes visible in the output of ipfs pin ls --type=recursive
? every call to ipfs add
should put an entry in there before it returns.
@whyrusleeping I'm trying to repro this behavior, but I am quite certain that we ended up with 10,000+ added but not pinned files in yesterday's run
Full stax: https://ipfs.io/ipfs/QmP344ugRiyZRKA2djLgE371MzW2FhwbghnJyH7hmbmAeQ
@parkan thank you! that stack dump is pretty gnarly, lots of lock contention trying to spew messages out to everyone on the dht, thats the providers problem alright. For that, as @jbenet says, we can make a separate build (for now, can merge in as an option later) that only provides the hashes you explicitly tell it to.
As for the pinning issue, thats a very serious issue if what you say is correct. I will investigate and try to reproduce locally as well. cc @Kubuxu for an extra set of eyes
@whyrusleeping I naively tried to do ipfs add --local $file
, but this still seems to provide; I am guessing --local
really means "no network reads", not "no touching network", right?
@parkan
ipfs add
from the filesystem? possible for you to add offline?
(ie ipfs add
without daemon running, then turn it on when you're done)Yeah, the --local
flag doesn't work for all commands unfortunately. Its something we need to unify. It mostly means "don't read data from the network for this call"
@jbenet would love to have a chat about working around provider system ASAP. This dataset is <0.01% of our target scale so it doesn't sound like we can work with it.
I will try to repro the pinning failure case -- fairly confident it happened as I suspect, because I don't see any other way for all those objects to have ended up in the store but not pinned.
Separately, running ipfs add ...
w/no daemon running seems to complete in reasonable and non-increasing time, is the data just coped into the store w/o announce in this mode?
Yes, it will be re-announced periodically.
You can also run daemon in offline mode, ipfs daemon --offline
.
@parkan
Ok I can confirm that the number of pinned files has fallen to 1 despite having 299392 objects in repo. Should I open a new issue for this?
@parkan yes please, and if you could provide repro steps
@parkan could you make a git repo that reproduces this exact problem? would love to have it as a test case.
@jbenet it's a bit hard to repro but I'll do my best -- seems like a serious issue
@parkan does the daemon output any errors when this happens?
Alright, I made a few PRs to help out here with the providers problem, #3101 #3102 #3103 #3105 and #3106
A bit of an update here, I've gotten a few of these PRs merged, only one remaining. I ran some tests to check bandwidth usage here, good results so far: https://github.com/ipfs/go-ipfs/pull/3105#issuecomment-242183017
@whyrusleeping sorry missed your earlier question, no relevant errors that I could see. Improvements looking fantastic, though!
@parkan thanks! I've got a few more things up my sleeve that should help too, but theyre a bit farther out. In the meantime, you should now be able to do all your adds with the --local
flag, and then manually do ipfs dht provide
on the keys you want advertised to the dht. The provide command has a -r
flag to recursively provide all child blocks too, but run without it will just provide the single given block.
Please let me know if anything weird comes up or you have any questions.
@whyrusleeping awesome! Great turnaround on this. Should we build from master to have these changes reflected?
@parkan yeah, you will have to build from master. I can also provide 'development' builds on the dists page if there is an interest in that
@whyrusleeping having unstable/dev/nightly binaries around would definitely be appreciated if it's not too much effort
@parkan sure thing! i'll push out some builds after #3105 merges, and start the discussion about automating some nightly releases
@whyrusleeping getting a nightly/CI build baked would be 👌
Yeah, it would be great to have channels in distributions. filed it over at https://github.com/ipfs/distributions/issues/77 and https://github.com/ipfs/distributions/issues/78
@whyrusleeping I'd also really appreciate a development build, especially because of this issue. :)
After adding 200MB of the Wikidata dataset with the current master version I have these stats:
Bandwidth
TotalIn: 11 GB
TotalOut: 2.0 GB
Compared to the previous report in https://github.com/ipfs/apps/issues/27#issuecomment-242562201
that's an improvement of ~2.65x for inbound and ~3.2x for outbound traffic when using the current master
compared to 0.4.2
.
The dataset consists of 24 million JSON files which are mostly 1-20KB big. The CLI estimated about ~12 days until completion which doesn't sound too great, but if time to publish the weekly changes of the dataset after that ends up being a few hours, it might be workable for that project.
Has the state of this issue changed since the last update?
I'm evaluating IPFS for a project, and if a 200MB dataset generates 13GB of traffic to add the data to IPFS, it will rule out IPFS as an option.
We would really like to be interoperable with IPFS and the broader IPFS ecosystem, but we'll have to use BitTorrent instead.
@Magik6k was trying to add cdnjs (something I did few months ago) he have hit https://github.com/ipfs/go-ipfs/issues/2823 but also reported that adding the dataset which was about 21GB created about 3TB of traffic.
@Magik6k could you give more precise data on this?