sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.13k stars 253 forks source link

Performance: Many small writes within a transaction via insertSubtreeX(...), if committed are horribly slow #296

Open JohannesLichtenberger opened 4 years ago

JohannesLichtenberger commented 4 years ago
    final var resource = "smallInsertions";

    try (final var database = JsonTestHelper.getDatabase(PATHS.PATH1.getFile())) {
      database.createResource(ResourceConfiguration.newBuilder(resource)
                                                   .storeDiffs(false)
                                                   .hashKind(HashType.NONE)
                                                   .buildPathSummary(false)
                                                   .build());
      try (final var manager = database.openResourceManager(resource); final var wtx = manager.beginNodeTrx()) {
        System.out.println("Start inserting");

        final long time = System.nanoTime();

        wtx.insertArrayAsFirstChild();

        var jsonObject = """
            {"item":"this is item 0", "package":"package", "kg":5}
            """.strip();

        wtx.insertSubtreeAsFirstChild(JsonShredder.createStringReader(jsonObject));

        for (int i = 0; i < 650_000; i++) {
          System.out.println(i);
          jsonObject = """
              {"item":"this is item %s", "package":"package", "kg":5}
              """.strip().formatted(i);

          wtx.insertSubtreeAsRightSibling(JsonShredder.createStringReader(jsonObject));
        }

        wtx.commit();

        System.out.println("Done inserting [" + (System.nanoTime() - time) / 1_000_000 + "ms].");
      }
    }

insertSubtreeAsFirstChild(...) per default commits.

JUnitStarter-2020-07-28.zip

JohannesLichtenberger commented 4 years ago

Some might be attributed to running from JUnit within IntelliJ plus Profiling, but still way too slow.

Only 49488 subtree inserts after about 6 minutes.

JohannesLichtenberger commented 4 years ago

We can add a parameter to disable the getParentKind() check on purpose. Furthermore I wonder if it's best to share one Writer and Reader in the ResourceManager to make the reinstantiate() call during a commit as the last action more performant. Another thing might be to disable reading all nodes of the PathSummary into main memory during writes.

Discussions are welcome :-)

JohannesLichtenberger commented 4 years ago

It's on SSDs and Hard drives mainly due to the fact, that we don't group transaction commits, but immediately write them to persistent storage. A lot of (small) writes (fsync) are probably also slow on byte addressable NVM, so we should avoid this at all cost. Furthermore write amplification occurs due to path copying and we should amortize the cost. This could be done via a transaction group commit for instance. If we version IndirectPages as well, we could read them in-memory, as it will be costly to read them from durable storage and the cost of path copying would be negligible. Thus, we could bascially write all the changes into a queue and defer the sync to durable storage. The read-write transactions, which are open (all but one in the preCommit stage for instance) could then be informed and the commit() method can return. The log would have to be moved to the ResourceManager, such that it can be shared, once a new read-write transaction starts (and the other one is in the preCommit state).