[Bug]: Transact 200 simple entities takes 30s using file store, is that normal?

replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.

https://datahike.io

Eclipse Public License 1.0

1.63k stars 97 forks source link

[Bug]: Transact 200 simple entities takes 30s using file store, is that normal? #479

Closed wangchunsen closed 1 year ago

wangchunsen commented 2 years ago

What version of Datahike are you using?

0.4.1480

What version of Java are you using?

1.8

What operating system are you using?

MAC

What database EDN configuration are you using?

{:store {:backend :file :path "./datahike"}}

Describe the bug

Just do a simple transact test using file backend, code like

  (time (doseq [a (range 1 200)]
          (d/transact conn [{:name (str "test-name-" a) :age a}])))

And then it takes Elapsed time: 30153.049576 msecs

What is the expected behaviour?

none

How can the behaviour be reproduced?

none

kordano commented 2 years ago

Thanks for bringing that up @wangchunsen. Since we are not batch processing transactions as of now, each transaction has to perform IO operations thus not performing well with a lot of single transactions. In one of the next releases with the PR #439 we will introduce manual batch processing so you can decide when you want to flush your data to the persistence layer, improving the write performance.

awb99 commented 2 years ago

great to hear that @kordano ! I noticed it too that a full import of my data (which is just 5000 invoices) takes 30 minutes to do. The mongodb solution I had prior did this in a few seconds.

whilo commented 1 year ago

This should be fixed with the latest version and the persistent-sorted-set backend, if you batch your inserts large enough it should not be much slower than DataScript in-memory, i.e. very fast. @awb99, @wangchunsen Can you report back how it does for you?

jsmassa commented 1 year ago

Transacting 5000 simple entities into an empty database takes around 500 ms with the persistent sorted set now. The time increases though with the size of the database.

I think we are well enough aware that we have to monitor performance regressions and keep trying optimizing performance wherever possible. So I think we should close this issue.

whilo commented 1 year ago

I will close this for now, please reopen if the issue persists.