Closed andrii0lomakin closed 4 years ago
Unfortunately memory overhead of page mapping in Java would be overwhelming, we are going to use the change buffer concept which was implemented in MySQL.
Closed because all changes will be implemented inside of https://github.com/orientechnologies/orientdb/issues/9029
The current size of pages which we use now is 64Kb. This page size is not optimal taking into account the ratio between the latency and speed of SSD. The optimal size of the page right now is about 4Kb. Also, let suppose we need to read records from the cluster to handle graph query initiated by the user and those records are not cached. Usage of 64Kb pages introduces a lot of the read amplification compared to the usage of 4Kb pages. There is problem thought. Our cache data structures use Java heap and migration to 4K pages will put tremendous overhead on GC. So to overcome the issue we may decrease the size of the page till 16K and use LZ4 compression to decrease the size of the page up to 4K. From our tests reaching of compression ratio equals to 4 or bigger is not an exception for our data.
Because we are going to use page compression it means that we should handle pages of variable sizes. So for that, we need to introduce a map between the logical page index and physical page index. Which in nutshell would be an array list. Usage of such a map is not something exceptional, Microsoft successfully uses it in its own databases. Usage of such page map also provides us with the capability to convert random writes into the sequential writes and perform defragmentation of file segments on the fly. On the average speed of sequential writes for SSD is about 5 times faster than the speed of random writes. So if we split storage by segments of size of about 64Mb then we can reuse any segments which are at least 20% empty. So in the worst case, a new implementation will consume 20% more space. But taking into the account that:
Also usage of storage which incorporate GC to deal with invalidated pages and support of pages of variable size allows to do not load pages for write if they do not present in cache but log operations on those pages and then merge logged operations and actual pages togather during GC phase.
As additional improvement of speed up of graph queries we are going to issue batch of async io requests to load pages not present into the cache in parallel.