replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.63k stars 97 forks source link

invalid blocking call from hitchhiker tree #303

Closed ysmiraak closed 2 years ago

ysmiraak commented 3 years ago

with -Dclojure.core.async.go-checking=true i sometimes get this error.

 :cause "Invalid blocking call in dispatch thread"
 :via
 [{:type java.lang.IllegalStateException
   :message "Invalid blocking call in dispatch thread"
   :at [clojure.core.async.impl.dispatch$check_blocking_in_dispatch invokeStatic "dispatch.clj" 29]}]
 :trace
 [[clojure.core.async.impl.dispatch$check_blocking_in_dispatch invokeStatic "dispatch.clj" 29]
  [clojure.core.async.impl.dispatch$check_blocking_in_dispatch invoke "dispatch.clj" 26]
  [clojure.core.async$fn__7110 invokeStatic "async.clj" 125]
  [clojure.core.async$fn__7110 invoke "async.clj" 125]
  [hitchhiker.tree.bootstrap.konserve.KonserveAddr _resolve_chan "konserve.cljc" 74]
  [hitchhiker.tree$lookup_path invokeStatic "tree.cljc" 315]
  [hitchhiker.tree$lookup_path invoke "tree.cljc" 296]
  [hitchhiker.tree.messaging$lookup_fwd_iter invokeStatic "messaging.cljc" 226]
  [hitchhiker.tree.messaging$lookup_fwd_iter invoke "messaging.cljc" 224]
  [datahike.index.hitchhiker_tree$_slice invokeStatic "hitchhiker_tree.cljc" 90]
  [datahike.index.hitchhiker_tree$_slice invoke "hitchhiker_tree.cljc" 57]
  [datahike.index$eval58729$fn__58732 invoke "index.cljc" 63]
  [datahike.index$eval58546$fn__58599$G__58519__58610 invoke "index.cljc" 8]
  [datahike.db.DB _datoms "db.cljc" 198]
  [datahike.pull_api$pull_attr invokeStatic "pull_api.cljc" 157]
  [datahike.pull_api$pull_attr invoke "pull_api.cljc" 153]
  [datahike.pull_api$pull_pattern_frame invokeStatic "pull_api.cljc" 244]
  [datahike.pull_api$pull_pattern_frame invoke "pull_api.cljc" 230]
  [datahike.pull_api$pull_pattern invokeStatic "pull_api.cljc" 256]
  [datahike.pull_api$pull_pattern invoke "pull_api.cljc" 251]
  [datahike.pull_api$pull_spec invokeStatic "pull_api.cljc" 271]
  [datahike.pull_api$pull_spec invoke "pull_api.cljc" 268]
  [datahike.query$pull$iter__70885__70889$fn__70890$fn__70898 invoke "query.cljc" 953]
  [clojure.core$map$fn__5888 invoke "core.clj" 2764]
  [clojure.lang.LazySeq sval "LazySeq.java" 42]
  [clojure.lang.LazySeq seq "LazySeq.java" 51]
  [clojure.lang.RT seq "RT.java" 535]
  [clojure.core$seq__5419 invokeStatic "core.clj" 139]
  [clojure.core.protocols$seq_reduce invokeStatic "protocols.clj" 24]
  [clojure.core.protocols$fn__8168 invokeStatic "protocols.clj" 75]
  [clojure.core.protocols$fn__8168 invoke "protocols.clj" 75]
  [clojure.core.protocols$fn__8110$G__8105__8123 invoke "protocols.clj" 13]
  [clojure.core$reduce invokeStatic "core.clj" 6830]
  [clojure.core$into invokeStatic "core.clj" 6897]
  [clojure.core$mapv invokeStatic "core.clj" 6905]
  [clojure.core$mapv invoke "core.clj" 6905]
  [datahike.query$pull$iter__70885__70889$fn__70890 invoke "query.cljc" 950]
  [clojure.lang.LazySeq sval "LazySeq.java" 42]
  [clojure.lang.LazySeq seq "LazySeq.java" 51]
  [clojure.lang.Cons next "Cons.java" 39]
  [clojure.lang.RT next "RT.java" 713]
  [clojure.core$next__5403 invokeStatic "core.clj" 64]
  [clojure.core.protocols$fn__8181 invokeStatic "protocols.clj" 169]
  [clojure.core.protocols$fn__8181 invoke "protocols.clj" 124]
  [clojure.core.protocols$fn__8136$G__8131__8145 invoke "protocols.clj" 19]
  [clojure.core.protocols$seq_reduce invokeStatic "protocols.clj" 31]
  [clojure.core.protocols$fn__8168 invokeStatic "protocols.clj" 75]
  [clojure.core.protocols$fn__8168 invoke "protocols.clj" 75]
  [clojure.core.protocols$fn__8110$G__8105__8123 invoke "protocols.clj" 13]
  [clojure.core$transduce invokeStatic "core.clj" 6886]
  [clojure.core$into invokeStatic "core.clj" 6901]
  [clojure.core$into invoke "core.clj" 6889]
  [datahike.query$eval70860$fn__70861 invoke "query.cljc" 937]
  [datahike.query$eval70837$fn__70838$G__70828__70845 invoke "query.cljc" 930]
  [datahike.query$eval70947$fn__70948 doInvoke "query.cljc" 1025]

i can't consistently reproduce this, but the stack trace points to some code in hitchhiker tree.

if anyone knows how to go around this problem, i'd be thankful!

(i'm using version 0.3.5)

whilo commented 3 years ago

Hey, thanks for reporting this. I somehow missed the introduction of this flag, it looks very useful. Do you use core.async yourself to invoke Datahike's query here?

ysmiraak commented 3 years ago

Do you use core.async yourself to invoke Datahike's query here?

i'm using pathom and fulcro, with a setup similar to this. the async query calls are made by pathom.

whilo commented 3 years ago

This is unfortunate and not so easy to resolve because on the JVM we compile away most of core.async, but bottom out on these ones in the storage layer. I have to think about it. A radical solution would be for us to provide our own core.async fork, but this is not a lightweight fix.

ysmiraak commented 3 years ago

i found this

the blocking effect is unfortunate but acceptable for me.

if there can be no lightweight solution, maybe close this issue?

kordano commented 3 years ago

@whilo what are your thoughts on this?

whilo commented 3 years ago

I think we either need to fork core.async to make sure that our code is not interfering with async application code or we need to go all in on non-blocking core.async and try to make the hitchhiker-tree async variant on the JVM as fast as the non-async one. The latter we have already tried together with @mpenet and it was not easy to achieve the same performance.

mpenet commented 3 years ago

This is a call to >!! or <!! inside a go block causing this, which effectively blocks an internal go dispatch thread, so clearly bad practice from whatever is doing that, this doesn't require forking core.async. There should never be any blocking IO inside go blocks, if this happens somewhere this should be fixed.

mpenet commented 3 years ago

these 2 are not looking good, I copied that code from the original when we moved the files around :/

https://github.com/replikativ/hitchhiker-tree/blob/master/src/hitchhiker/tree/bootstrap/konserve.cljc#L74 https://github.com/replikativ/hitchhiker-tree/blob/master/src/hitchhiker/tree/bootstrap/konserve.cljc#L114

whilo commented 3 years ago

Yes, this was done deliberately and while being aware of the problem because all in core.async was failing us performance-wise in the hitchhiker-tree, while all our storage layer exposes core.async abstractions (and is actually now written in asynchronous java.nio). We might be able to use another blocking mechanism there, but I think it would be better if we could make core.async work instead.

whilo commented 3 years ago

Ok, I (re)evaluated all the options (changing core.async to detect blocking calls and spawn threads, using JVM green thread solutions and implementing synchronous IO for our storage layer) and I think we need to implement synchronous storage IO in konserve and purge all blocking calls from the codebase (including surface API). Honestly I am fairly disappointed by the JVM ecosystem to still not have high performance native coroutine/fiber support after over 10 years of success of node.js and go-lang (not to speak of better, but less popular languages such as Haskell or Erlang). We basically have to bend core.async to a hybrid programming model/language with Clojure macros for abstractions which actually belong into the VM, because core.async has a non-trivial overhead.

whilo commented 2 years ago

This should now be fixed with the bump to the new konserve and hitchhiker-tree release in 0.6.1521. @ysmiraak if not please reopen this issue!