Open andrii0lomakin opened 9 years ago
Nice! Is this incremental backup feature going to be implemented in the automatic backup plugin, too?
@finid I am responsible for storage engine but I bet that if you will create issue it will be implemented :-).
Cool, I will, then!
I confirm it will be supported by automatic backup component. To track it I've created this issue: https://github.com/orientechnologies/orientdb/issues/5155
This means at every update/delete/insert we should update and save the bloom filter somewhere. What's this cost? Also, this could be seen as an improvement. I think we could stay with current solution, otherwise 2.2 will never reach the RC!
This means at every update/delete/insert we should update and save the bloom filter somewhere. >What's this cost?
@lvca That is not true we do not need to save bloom filter anywhere.
Also cost of update of bloom filter is no more than addition time in hashmap . So I do not see any hidden costs which I did not describe.
So do want to keep the BF in RAM only?
Sure we do not need them for long term.
removed from backlog in favor of https://github.com/orientechnologies/orientdb/issues/7686
In current implementation of incremental backup we iterate over pages to find which of them were changed, but for huge databases such process may take several hours. It is not really a problem for many cases because we do not block database on writes but sometimes backup should be done in very stiff terms. To avoid given problem we are going to introduce counting bloom filter. Just to clarify. By counting bloom filter I do not mean classic implementation of given data structure which can not be expanded in case of it reaches it's limit of capacity of registered items, but I mean family of modern algorithms with similar behaviour which allows to dynamically expand capacity of "bloom filter".
In nutshell algorithm is following:
Actually that is almost speed limit for incremental backup because in 95% of cases we will load pages which are changed and that is action which we have to do any way.
Also as drawback part of database journal which have to be "cut" and backed up will be really small because of small period of time is needed to perform backup.