pmwkaa / sophia

Modern transactional key-value/row storage library.
http://sophia.systems
Other
1.86k stars 154 forks source link

API question: cursors and databases #154

Open os12 opened 6 years ago

os12 commented 6 years ago

Hi Dmitry, I've started stated testing Sophia and hit an issue: a cursor-based walk returns keys across databases. Assume two databases, "db1" and "db2". One of them is opened and keys are fetched using a cursor:

    void *env = sp_env();

    sp_setstring(env, "sophia.path", ".store", 0);
    sp_setstring(env, "db", db_name.c_str(), 0);
    if (sp_open(env) == -1)
        return FromStringError(LastSophiaError(env));

    void *db = sp_getobject(env, ("db." + db_name).c_str());
    if (db == nullptr)
        return FromStringError(LastSophiaError(env));

    void *cursor = sp_cursor(env);
    void *doc = sp_document(db);
    sp_setstring(doc, "order", ">=", 0);

    while ((doc = sp_get(cursor, doc))) {
        int size;
        auto key_ptr = static_cast<const char *>(sp_getstring(doc, "key", &size));
        ...
        ...
    }

Am I missing something here? Do cursors go across databases? If so, how do I provide an additional filter to limit iteration to a single DB?

Basically, I can achieve the behavior I want by splitting the databases at a higher level this way:

    sp_setstring(env, "sophia.path", (".store/" + db_name).c_str(), 0);
    sp_setstring(env, "db", "mydb", 0); // hard-coded name

Could you clarify the design behind the aforementioned "path "and "db" settings please?

Thanks a lot! Oleg.

pmwkaa commented 6 years ago

Hi, not quite sure what the issue in here. There is no way records can overlap this way, only if you use different db pointers during sp_document() creation. Are you sure it sets correct name for sp_getobject()?

sophia.path specifies main directory, where all databases will be stored by name. For example, db.mydb will be placed in sophia.path/mydb

os12 commented 6 years ago

Right, both the cursor-based "read" path and the transaction-based "write" path start with the same code (the first fragment). I do see per-database subdirectories created inside .store as well as log (which seems to contain the last few mutations).

I know you have tests that cover multi-db mutations. Yet, do you have a multi-db cursor test?

pmwkaa commented 6 years ago

Could you please specify your work case you are trying to achieve? One cursor is designed to iterate only one database at a time. If you need to make a join between databases, you need to open a second cursor, and so on.

os12 commented 6 years ago

Could you please specify your work case you are trying to achieve? One cursor is designed to iterate only one database at a time

Yes, that's what I expected. Yet the cursor returns keys from an adjacent DB, which is very surprising.

os12 commented 6 years ago

Please see #156 for the repro - a cursor-based walk in DB2 should find no keys (yet it fetches everything from the DB1).

pmwkaa commented 6 years ago

Now i understand what you meant. This is known issue, which might be quite confusing. Sophia does not have persistent catalog of created databases, they must be predefined each time before sp_open(). This is done to support easier Sophia integration with DBMS's which have its own database catalog.

Open procedure will find and recover every specified database directory by its name. After that it will reply write-ahead log records. To identify database in write-ahead log db.name.id is used: http://sophia.systems/v2.2/conf/db.html. If database is not specified it will be set in sequential order of database definition.

os12 commented 6 years ago

Umm... I don't follow. Is there a way for me do deal with this issue?

If this does not work, I will have to create distinct directories to isolate the DBs... Are there downsides?

pmwkaa commented 6 years ago

Sophia has a single write-ahead log which is shared between databases to efficient support cross-database transactions. To match a database in the log db.name.id is used.

sp_open() will fail if it will not find a defined database during log reply. In other words, scheme must be exactly the same every time you open environment. Basically, if you define mydb0 and mydb1, next time you need to define them again in the same order (or set db.name.id to distinct them).

Keep in mind, that open operation is very time consuming. Keep a database open, if you need good timing.

os12 commented 6 years ago

OK, I've extended the code to set db.db1.id bit it's failing with this:

sophia/runtime/sr_conf.c:100 db.db1.id is read-only

Could you take a look at my second commit in #156 please? What am I missing?

os12 commented 6 years ago

Also, let me use this comment to collect misc notes/comments:

  1. [minor] The list of options at http://sophia.systems/v2.2/conf/db.html is hard to understand as it mentions things like db.name.id
    • it should really say something like db.$(db_name).id as the other string is meaningless
  2. [major] The across-db nature of the log file makes things very tricky:
    • it should be documented somewhere in BOLD
    • these DBs are not actually independent (as they share the write-ahead log) and act more like namespaces
    • the code is hard to use in environments that create databases on the fly
      • the user must maintain an ordered list of DB names and open the entire set every time
      • a new DB can only be added during Environment initialization and only as a last one
        • sp_open() is expensive and so the DBs should not be closed
        • yet the environment must be torn down and recreated in order to add a new DB
pmwkaa commented 6 years ago

My fault, sorry about that. I completely forgot that custom id generation scheme been removed in v2.2 (so recommended scheme was actual only for previous version). Right now, the only way to make this work is to always define same databases in same order. I was planning to completely remake it.

It is also possible to force compaction and wait for its completion, this will guarantee that write-ahead records will not have database references. This will also increase open time dramatically.

Could you describe your use-case in more details? i might have some thoughts how to make it in optimal way.