mafintosh / hyperlog

Merkle DAG that replicates based on scuttlebutt logs and causal linking
MIT License
470 stars 33 forks source link

locks #2

Open mcollina opened 9 years ago

mcollina commented 9 years ago

Why does hyperlog lock itself so much? I'm really concerned about the performance of add, https://github.com/mafintosh/hyperlog/blob/master/index.js#L100. I fear that it would cripple performance on write.

Is that due to the dag.changes property?

Also there is a log on replicate. https://github.com/mafintosh/hyperlog/blob/db470807d8cb9c5eb19299b9e122b753fba11348/lib/replicate.js#L192-L199

mafintosh commented 9 years ago

It is due the .changes yea. Its easy to fix though. We'd just have to batch everything inbetween locks (if that makes sense) but I wanted to get basic replication stuff working before starting to optimize it

mafintosh commented 9 years ago

The same goes for the replicate lock.

mcollina commented 9 years ago

How about not locking at all? Even batching between locks might slow things down.

mafintosh commented 9 years ago

How would you safely append to an sequential append-only log without a mutex? If you have an idea on how to do this I'm all in favor of removing the locks :)

The replicate lock when sending heads is going away in the next version. I found a way to safely get rid of that.

mcollina commented 9 years ago

Unfortunately, the only way that pops into my mind is caching. Bad news, this is going to get tricky, good news, it's going to be fast.

Looking at the code, there are three race conditions going on:

  1. The dag.changes property is read https://github.com/mafintosh/hyperlog/blob/master/index.js#L106 and written in https://github.com/mafintosh/hyperlog/blob/master/index.js#L143, which is at best two ticks forward, we have to update it in a single tick if we want to be lockless, basically updating it before it is written. I think this has no side effect (but you might know better). Skipping a value can effect things? (in case of errors).
  2. The seq might need to be cached too https://github.com/mafintosh/hyperlog/blob/master/index.js#L108. However, I am not getting why the whole level-logs thing is needed. I see that it is written during add and used in replication. Can we collapse the changes and seq concepts?
  3. The heads needs to be cached, or append will never work. We can use a dirtry trick given node is single process: we can maintain the cache as an array, that gets overwritten synchronously before appending a new value. That cache needs to be maintained during replication (but I think it's doable).

All these still requires the ready mutex, but it's ok.

dominictarr commented 8 years ago

in secure-scuttlebutt we queue appends, so a new append is validated against the end of the queue, and then writes go in batches. Write the entire queue, and when that finishes, write the next batch.