sahib / brig

File synchronization on top of ipfs with git like interface & web based UI
https://brig.readthedocs.io
GNU Affero General Public License v3.0
567 stars 33 forks source link

Enhancement/benchmarks fuse #82

Closed evgmik closed 3 years ago

evgmik commented 3 years ago

Fuse read and write speed benchmarks. Works towards #68

evgmik commented 3 years ago

Good first step.

I would call it half a step, since it is tested with "fake" catfs without reflecting to ipfs. But I am about to start working on replacement fuse holding files in memory during modification, so I needed at least some benchmarks.

Maybe this could be also done outside the fuse package, since it will later include also benchmarks for things like brig cat and brig stage or more isolated benchmarks like encryption or compression benchmarks?

Also maybe this could be instead implemented as brig command? Something like brig debug iobench which users could easily execute as part of bug reports. Just leaving that as idea to discuss.

Cleaning up after such test might be tricky, but I like the idea.

sahib commented 3 years ago

I would call it half a step, since it is tested with "fake" catfs without reflecting to ipfs. But I am about to start working on replacement fuse holding files in memory during modification, so I needed at least some benchmarks.

Well, that's not so bad. catfs is just in memory and will probably output more stable numbers than what ipfs with it's many I/O calls will do. Sure, those are not the numbers one will actually get when using the fuse layer with ipfs, but they should be easier to compare.

Cleaning up after such test might be tricky, but I like the idea.

Why would it be tricky?


If we go for brig debug iobench we could also output various system information that influences the benchmark results (CPU model, IPFS version etc.). All throughput numbers can also be expressed additionally as relative to the baseline (e.g. just copying bytes from ten-source to ten-sink).

evgmik commented 3 years ago

Well, that's not so bad. catfs is just in memory and will probably output more stable numbers than what ipfs with it's many I/O calls will do. Sure, those are not the numbers one will actually get when using the fuse layer with ipfs, but they should be easier to compare.

There is a part of in memory which sits on hard drive. I am guessing it is the badger or something. When we mount catfs we provide a path for it; its content size grows very quicklu with every write. Makes me wonder if it is truly "in memory".

Cleaning up after such test might be tricky, but I like the idea.

Why would it be tricky? If we do test on a live brig instance, we would have to write quite a lot to ipfs and to the catfs. To clean up, we have to be track what was additionally written and remove such haches from ipfs (with potential for real content overlap and removal of something important),also catfs had to be reverted to the before benchmark state. We can do it via commits, but it would contaminate history with "benchmark" commits.

Another issue, what if a user modifies something during benchmarks. How to separate it?

If we go for brig debug iobench we could also output various system information that influences the benchmark results (CPU model, IPFS version etc.). All throughput numbers can also be expressed additionally as relative to the baseline (e.g. just copying bytes from ten-source to ten-sink).

I am setting it as v0.7 target.

evgmik commented 3 years ago

There is a part of in memory which sits on hard drive. I am guessing it is the badger or something. When we mount catfs we provide a path for it; its content size grows very quicklu with every write. Makes me wonder if it is truly "in memory".

Hmm, interesting. catfs.MemFsBackend actually is just memory - you can check the implementation in catfs/backend.go. The metadata database however is indeed a badger database. That might indicate that many writes fill it up quickly. Can you check what those temporary files are?

NewFileSystem creates badger db with db.NewBadgerDatabase(dbPath) even if catfs is in memory. Many (thousands )consequent writes (which Benchmarks like to do), grow this DB to quite large size. This is why I had to unmount/remount catfs in write benchmarks.

evgmik commented 3 years ago

Note to myself, I need to write fuse Seek implementation.

What do you mean?

It is me being silly. I forget that read/write sets the cursor, without prior call to Seek.