speedb-io / speedb

A RocksDB compliant high performance scalable embedded key-value store
https://www.speedb.io/
Apache License 2.0
921 stars 72 forks source link

Proactive Flushes: use env max flushes as flush limit #360

Open Yuval-Ariel opened 1 year ago

Yuval-Ariel commented 1 year ago

i dont see a reason why a user should set max simultaneous flushes for the proactive WBM while theres already a setting for the max bg flushes a group of dbs can use.

seems logical to me that this configuration can be spared from the user and just take the number of max parallel flushes from the env (how many high priority threads there are).

Yuval-Ariel commented 1 year ago

what do you think @udi-speedb ?

udi-speedb commented 1 year ago

@Yuval-Ariel:

  1. Could you please point to the exact function you suggest to use?
  2. Is this a single setting for all the DB-s or is it a per DB setting / per Env (I am not familiar with it)?
  3. Is it possible to change this setting? If the answer is yes, we should also support this dynamic change in the WriteBufferManager.
  4. What if not all the DB-s / Env-s have the same setting (take the minimum of all values?)?
  5. The WriteBufferManager is currently generic. It has minimal dependencies. It is not aware of the environment. So, I am not sure it would be a good idea to add this dependency.
  6. Automating and simplifying the configuration on the part of the user is definitely desirable. We have just added a new configuration that the user is expected to set correctly, so, it definitely makes sense to save the user the trouble. Having said that, and considering the implications of the previous points, I am not sure it's worth doing that before we actually see users using this feature and providing some feedback.
Yuval-Ariel commented 1 year ago
  1. options->env->GetBackgroundThreads(Priority pri) (e.g in env/env_posix.cc)
  2. its per Env and the env applies to all dbs which have been given that env. so all the dbs using the same env will have a total limit of High priority threads.
  3. yes its possible through SetBackgroundThreads
  4. im not sure it makes sense not to use the same env for all dbs using the WBM. need to research. @hilikspdb ?
  5. the number of concurrent threads that should be run on all dbs is bound by the thread pool in any case so i dont see any harm in this.
  6. im not suggesting we should start working on it now. just opened it to discuss
mrambacher commented 1 year ago

Many of those options are part of the DBOptions (and not properties of the Env). This means that the "last DBOptions" in would win on the settings, if there are multiple DBs.

hilikspdb commented 1 year ago

At least in the case of "background flush" the db_option would only set a higher value. Regardless, we need a multiple database configuration file that will collect all the objects that may be common to multiple databases (cache , wbm, background jobs etc) and add to them (for example we need a log file or else statistics of usage etc) ...

On Sat, Jan 14, 2023 at 12:14 AM mrambacher @.***> wrote:

Many of those options are part of the DBOptions (and not properties of the Env). This means that the "last DBOptions" in would win on the settings, if there are multiple DBs.

— Reply to this email directly, view it on GitHub https://github.com/speedb-io/speedb/issues/360#issuecomment-1382453705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZKTXGFFJ4KJPRNYPX4XZQLWSHHUPANCNFSM6AAAAAATZR6VOM . You are receiving this because you were mentioned.Message ID: @.***>

udi-speedb commented 1 year ago

@Yuval-Ariel - I'll just refer to #5 and creating new dependencies. In general, an infrastructure entity such as the WBM, which is currently independent of most other entities, is a very good thing and should be kept as such. It may be possible to support your proposal in other means, without creating dependencies, however, I will leave that to when we actually decide to do anything about it.

Yuval-Ariel commented 1 year ago

what happens when a user sets WBM.max_flushes = 4, but the env thread pool for these dbs only supports max 3 flush threads?

udi-speedb commented 1 year ago

The WBM will try to initiate 4. As for what happens afterwards, I can't say for sure. I can only guestimate: I assume that a flush job would be created and wait its turn to run. @hilikspdb - Could you please answer?

hilikspdb commented 1 year ago

Additional flushes will wait in the flush queue and be executed once a flush will end

On Sun, 15 Jan 2023, 16:44 udi-speedb, @.***> wrote:

The WBM will try to initiate 4. As for what happens afterwards, I can't say for sure. I can only guestimate: I assume that a flush job would be created and wait is turn to run. @hilikspdb https://github.com/hilikspdb - Could you please answer?

— Reply to this email directly, view it on GitHub https://github.com/speedb-io/speedb/issues/360#issuecomment-1383168622, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZKTXGEYBTDWH4FAB3P4UDLWSQEMPANCNFSM6AAAAAATZR6VOM . You are receiving this because you were mentioned.Message ID: @.***>

Guyme commented 1 year ago

@hilikspdb @udi-speedb @Yuval-Ariel @mrambacher - Final thoughts ?

hilikspdb commented 1 year ago

The relationships between database options and multi-database options are too complex and need work to simplify them. This should not impact the current project

On Wed, Jan 18, 2023 at 12:25 PM Guyme @.***> wrote:

@hilikspdb https://github.com/hilikspdb @udi-speedb https://github.com/udi-speedb @Yuval-Ariel https://github.com/Yuval-Ariel @mrambacher https://github.com/mrambacher - Final thoughts ?

— Reply to this email directly, view it on GitHub https://github.com/speedb-io/speedb/issues/360#issuecomment-1386821953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZKTXGEQQRAHS376WC3UQDTWS7AK5ANCNFSM6AAAAAATZR6VOM . You are receiving this because you were mentioned.Message ID: @.***>

Guyme commented 1 year ago

@Yuval-Ariel - Are you satisfied with this answer or you still think we can use the same paramater ?

Yuval-Ariel commented 1 year ago

i agree this should be done as part of a bigger project of correctly setting and sanitizing options for multi-db setup

Yuval-Ariel commented 1 year ago

and perhaps a note in the documentation of this feature explaining to the user how to set the WBM.max_flushes and its connection to the env's max flush threads.