leil-io / saunafs

SaunaFS is a free-and open source, distributed POSIX file system inspired by Google File System.
https://saunafs.com
GNU General Public License v3.0
62 stars 5 forks source link

feat: Improve big sessions create/open performance #216

Closed dmga44 closed 1 month ago

dmga44 commented 1 month ago

These changes target improving the performance of sessions which have already opened many files. Previous implementation of the openedfiles member was a linked list sorted by inode which caused that searching, inserting, counting how many files are open and most operations were spending time proportional to the already opened files, i.e O(n) if n is the number of opened files.

The solution proposed changes the type of this member to std::set and makes complexity of all operations O(logn).

A very similar approach was applied for afhead and idhash variables in client. Also a benchmark test for this behavior was added.

This change also improves readability of these parts of the code.

uristdwarf commented 1 month ago

I remember doing something similar with https://github.com/leil-io/saunafs/commit/3bbde7864f225656fba5e8cb828a111b5d1d760e but with vector. I think I tried std::set, but either there were implementation issues or performance (sadly I don't remember which). Could you compare the performance of this commit to that commit I posted? std::vector may still be yet faster.

dmga44 commented 1 month ago

I remember doing something similar with 3bbde78 but with vector. I think I tried std::set, but either there were implementation issues or performance (sadly I don't remember which). Could you compare the performance of this commit to that commit I posted? std::vector may still be yet faster.

Mate, I'll comment a few things that I notice on your commit:

uristdwarf commented 1 month ago
  • you are anyway traversing the vector, which means you are doing O(n) operations for those operations. In the Tsukaery cluster we have sessions with 7*10^5 open files. So you are through all of them which is not good. The std::set complexity for insertion, deletion, find, and almost everything is O(logn), which means that for these case the cost of insertions and openings should be improved be a 10^3 factor guaranteed. I'm most definately sure of it, but I'd love to create a benchmark for it

I mean it makes complete sense that set should be faster, but I don't remember what problems I faced with set exactly. I did test it on a smaller amount of open files I believe (20000 files I believe?)

uristdwarf commented 1 month ago

Benchmark is a good idea though :+1:

dmga44 commented 1 month ago

Benchmark is a good idea though 👍

Do you still have the script you used for those tests?

uristdwarf commented 1 month ago

I used this mostly unmodified on a tmpfs

https://github.com/kofemann/mdbench

It tests a lot of things, but when changing to vector it mainly improved open and stat rate

dmga44 commented 1 month ago

@uristdwarf Look, but I don't think that this benchmark script is keeping all those files open until the very end, which is what we are currently worried about. I'll create our own very simple benchmark tool, and let's see.