Open max-privatevoid opened 2 years ago
Hi,
Your general assessment is correct: checking the pinned status is quite expensive operation. Since we talk to the ipfs and it then has to check for all chunks. Consequently, ls
will not come back instantly, since it waiting for each file pinned status. I think we setup a cache for this status
so second call to ls
should be fast within some time interval.
As for the hammering the disk every 15 minutes, it might be triggered by IPFS itself. I am not sure, but I recall that IPFS backend does garbage collection every 15 minutes or so. With many large files it usually takes a while. See IPFS configuration for option responsible.
Also try to set
brig cfg set repo.autogc.enabled false
it should help too.
Try to use development version of brig
, we did make several speed improvements.
But both IPFS and brig do not scale to well with many files repos, mostly because of IPFS. One can disable garbage collection which helps but than you may have a lot of stale chunks in the backend.
IPFS has a database backend, instead of flatfs.datastore
it might work better, but I did not play with it in practice.
Thanks for your reply.
I am using the latest development version already, as well as badgerds as my datastore for the IPFS node.
I've worked around the performance issue caused by the IsCached
lookup by simply ripping it out from the code paths for brig ls
and brig info
, making those commands near-instant even at the root of the repo. As for the repin issue, I've relaxed the constraints on locking the fs mutex during the repin loop a bit. I don't know whether this is still safe, but it does allow operations such as brig ls
to run while a repin is running.
Performance patches in my config repo
These patches combined with some config tweaks seem to have made the whole thing decently workable.
My current patches are probably far too hacky for upstreaming, but if I find better ways around these issues you'll see PRs from me.
I would agree, it is too hacky to just disable the check. This way you cannot be sure about the state of the local repo. You might unpin something which has no replica anywhere else.
As I said, the first check for pin status is the only one which is long. You can increase cache retention time and then consequent ls
will be fast. See
https://github.com/sahib/brig/blob/6b7eccf8fcbd907fc759f8ca8aa814df8499e2ed/backend/httpipfs/shell.go#L104
The rest is IPFS doing its own things.
An ideal solution would be to have asynchronous pin test check.
By the way, what is your use case?
My use case is Syncthing-like (or Nextcloud-like, since that's the actual thing I seek to replace) personal file synchronization. I'm hoping to use brig for this because I already have a decent IPFS setup on my machines and I like the "lazy synchronization" aspect where files are only fetched when accessed.
Currently we ask IPFS about every file pin status separately. I have an idea how to speed up the pin check: we need to ask IPFS about all available pins in one go and then work with cached IPFS pin status.
Could you check how long does it take on your large data collection
ipfs pin ls > /dev/null
and ipfs refs local > /dev/null
if this happens fast enough than we have hope.
On a separate note, if you file collection is static brig will work. But if you thinking about using it as a work tree with a lot of changes, then merging might give you trouble. @sahib and I worked on it while ago, but did not make it fully bullet proof.
I cancelled ipfs pin ls
after a couple of minutes since that's probably too long already. ipfs pin ls --type=recursive
on the other hand returns very quickly. I believe everything brig pins is type=recursive, so that could work.
Benchmark 1: ipfs pin ls --type=recursive > /dev/null
Time (mean ± σ): 91.9 ms ± 5.2 ms [User: 130.1 ms, System: 25.4 ms]
Range (min … max): 85.6 ms … 103.6 ms 33 runs
Benchmark 2: ipfs refs local > /dev/null
Time (mean ± σ): 2.010 s ± 0.153 s [User: 2.728 s, System: 0.216 s]
Range (min … max): 1.906 s … 2.438 s 10 runs
Regarding work tree operations, I do intend to use brig like that and I believe I've already run into a problem. I've been debugging it with the following repro steps:
ali touch test/x.txt
bob sync ali
# bob now has the file
bob info test/x.txt
# bob removes the file
bob rm test/x.txt
# sync the removal
ali sync bob
ali ls -R
# ali should not have test/x.txt anymore
I can open a separate issue for that if you think it's worth looking at.
Thanks for benchmark, I double check the isPinned code at https://github.com/sahib/brig/blob/6b7eccf8fcbd907fc759f8ca8aa814df8499e2ed/backend/httpipfs/pin.go#L76
The situation is a bit more complex: the pin status alone is not enough. Pin is equivalent for download and keep request, it does not guarantee that children chunks are already locally available. So we called for --recursive
check for children as well. This takes a lot of time.
My use case was to ask brig
to download a folder for offline use, so I needed to be sure that all chunks are locally available.
But pinning is difficult: there is pin at brig
which just ask IPFS to pin this file. But a file can be pinned but not yet locally available, i.e not yet cached. I, as user, want to be aware of both situations. But this is very time consuming check. Maybe we should/can add a switch to ls
to skip caching status. I unfortunately do not have time to dig into it. But @sahib welcomes patches and use to review them quite quickly.
Please file a separate bug about tree operations. Only @sahib fully understand the tree storage system. I recall working on similar issue, but I was not able to solve it. There were no merge conflict resolution mechanism envisioned, so syncing has strange side effects.
Describe the bug With a large dataset (~70GB, ~4500 files),
brig ls
takes a very long time, as does performing a repin, Worse yet, repinning seems to be an operation that blocks all other file operations, so with the default configuration brig stalls for hours every 15 minutes. I suspect this is due toIsCached
being called on everything in the repo and therefore performing a/refs
call on all the CIDs brig knows about.To Reproduce
brig ls
or
Please always include the output of the following commands in your report!
brig bug -s
go version:
`` uname -s -v -m:
Linux #1-NixOS SMP PREEMPT Wed May 18 08:28:23 UTC 2022 x86_64`` IPFS config (only datastore):brig client version:
v.. [build: ]
(built from https://github.com/sahib/brig/commit/6b7eccf8fcbd907fc759f8ca8aa814df8499e2ed) brig server version:v..+
IPFS Version:0.12.2+
Expected behavior
brig ls
should return (near) instantly. Repinning should be performed in the background, without blocking other commands and without hammering the disk with read operations.