replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.62k stars 95 forks source link

Query stats #601

Closed jsmassa closed 1 year ago

jsmassa commented 1 year ago

New function query-stats returns details like tuple counts and time during execution steps of the query engine.

whilo commented 1 year ago

Nice that you started working on this! I would suggest to track wall clock time in the realization of individual relations as well in addition to counting their input binding and output set. Other things that might be relevant is memory being used by a relation (in general all kinds of resources). That is harder to track, but the JVM has profiling functionality for instance. I think it would be good to discuss what we need and what we will use the profiling data for exactly.

Also, I have already talked to @kordano about this, I am not sure whether introducing a middleware just to wrap the query on the outside is worth it, you can use alter-var-root already to instrument any function like this. The name middleware is very generic and here it is a very specific and simple wrapper for query, I would expect a middleware to bind much more tightly into a stack, as is done for instance with replikativ into kabel.

jsmassa commented 1 year ago

I didn't implement the stats as middleware anyway because I had to go deeper into the codebase. It's triggered by a simple flag in the query map at the moment.

We can add space measurements later, but I wouldn't put a lot of faith into results of the those on the JVM.

whilo commented 1 year ago

I think an easy way to do approximate space measurements would be to serialize each context after it has been created and measure its byte size (the simplest option is pr-str with count, which might be a bit misleading because it prints numbers instead of storing them efficiently, CBOR/fressian would be better). This should happen after timing of course and probably be optional because of the slow down, but it would give a good sense for memory usage of each join. I think this will be important to know as maintaining memory bounds is necessary in many applications, while response time might not be most critical. It can be done later, but rendering more of the behaviour visible now will help us in making design decisions.