Closed luposlip closed 2 years ago
The current impl is naive, running brute force for all registered schemas, throwing lot of exceptions on invalid types. It should be easy to make it many orders of magnitude faster with few simple tricks, including:
1) type based checking, e.g. a avalue of type Long
(in JVM) should only be checked against numeric schemas
2) not throwing exceptions
There are other things that should be added to Providers: https://github.com/metosin/malli/issues/74
Most likely, won't have time to fix this any time soon, so contributions would be most welcome.
Did a quick test what happens. Here's a quick code that runs order of magntude faster (but is still dead-slow):
(require '[malli.core :as m])
(require '[malli.provider :as mp])
(require '[clojure.set :as set])
(require '[criterium.core :as cc])
(doseq [registry [;; 2700 µs
m/default-registry
;; 450 µs
m/predicate-registry
;; 230 µs
(-> m/predicate-registry
set/map-invert
set/map-invert
(dissoc 'empty?))]]
(cc/quick-bench (mp/provide [1] {:registry registry})))
e.g. don't test the values against all registry entries, most of them fail with exception ([:map 1]
), just test the predicates for values and only once (both int?
and 'int?
are registered), also, empty?
throws on all non-seqs.
-from-value
protocol method into IntoSchema
for optimised inferringI think this idea was probably brought up in the past, but generation could be significantly sped up by failing faster.
-fail
uses ex-info
which gets the stack trace and filters it. Much faster to (3-4x) to just throw a raw clojure.lang.ExceptionInfo
, like so:
(defn fast-ex-info
([msg map]
(clojure.lang.ExceptionInfo. msg map))
([msg map cause]
(clojure.lang.ExceptionInfo. msg map cause)))
Even faster is to fail without a stack trace:
(ns malli.impl.Exception)
(gen-class
:name malli.impl.Exception
:extends java.lang.Exception)
(defn -fillInStackTrace
([_])
([_ _]))
Then
(require 'malli.impl.Exception)
(clojure.core/compile 'malli.impl.Exception)
(defn very-fast-ex-info
([msg map]
(malli.impl.Exception. msg map)))
Another ~3x speedup
These gains only benefit the JVM, but they're very significant.
glaring issue:
Overriding -fail
and losing stack traces.
Maybe add some configuration (global?) which selects fail-fast
?
Thanks for giving a thought on this. I think adding a value class-based dispatch would enable a solution, that is 100-1000x faster, would not throw and would be extendable. In short:
IntoSchema
can describe which values classes it accepts and a function to extract the actual schema from a value without throwingSchema
instance or AST is checked using m/validator
mp/inferrer
would be a function of registry -> values -> schema
and would calculate 1 & 2 ahead of time.mp/infer
would be a repl-friendly version of that;; with defaults
(def infer (mp/inferrer))
(infer [1 2 3])
: => :int
That's a way better solution than my idea. My implementation just gives you a relatively easy speed up, while your suggestion requires some design and careful thought. You can still take it as an intermediary solution until a more robust and correct implementation is ready
Could you run a sample test with and without the change in exceptions? e.g. using the :perf
profile:
(def samples
[{:id "Lillan"
:tags #{:artesan :coffee :hotel}
:address {:street "Ahlmanintie 29"
:city "Tampere"
:zip 33100
:lonlat [61.4858322, 23.7854658]}}
{:id "Huber",
:description "Beefy place"
:tags #{:beef :wine :beer}
:address {:street "Aleksis Kiven katu 13"
:city "Tampere"
:zip 33200
:lonlat [61.4963599 23.7604916]}}])
(cc/quick-bench
(mp/provide samples))
reorganized the code a bit, so now there is a mp/provider
which caches most of the internals and is 5x faster with that test set. See #439
did a draft of the full (100x) fix too: https://github.com/metosin/malli/pull/440
Current status, 330x faster with a sample map than the original. Still one 10x to go :)
(def samples
[{:id "Lillan"
:tags #{:artesan :coffee :hotel}
:address {:street "Ahlmanintie 29"
:city "Tampere"
:zip 33100
:lonlat [61.4858322, 23.7854658]}}
{:id "Huber",
:description "Beefy place"
:tags #{:beef :wine :beer}
:address {:street "Aleksis Kiven katu 13"
:city "Tampere"
:zip 33200
:lonlat [61.4963599 23.7604916]}}])
;; OLD: 126ms
(cc/quick-bench
(mp/provide samples))
;; 0.6.2: 26ms (5x)
(let [provide (mp/provider)]
(cc/quick-bench
(provide samples)))
;; 0.7.0 (unreleased): 380µs (330x)
(let [provide (mp/provider)]
(cc/quick-bench
(provide samples)))
There is still a check for the inst?
in clojure.core
that uses satifies?
, which is dead slow. Without it, here's the current upcoming perf. Still not good, but a 460x faster than the original:
(defn provider-test2 []
(let [samples [{:id "Lillan"
:tags #{:artesan :coffee :hotel}
:address {:street "Ahlmanintie 29"
:city "Tampere"
:zip 33100
:lonlat [61.4858322, 23.7854658]}}
{:id "Huber",
:description "Beefy place"
:tags #{:beef :wine :beer}
:address {:street "Aleksis Kiven katu 13"
:city "Tampere"
:zip 33200
:lonlat [61.4963599 23.7604916]}}]]
;; 126ms -> 2.5ms
(p/bench (mp/provide samples))
;; 26ms
;; 380µs, no exceptions, (330x)
(let [provide (mp/provider)]
(p/bench (provide samples)))
;; 270µs, no exceptions, no inst? (460x)
(let [registry (dissoc (m/default-schemas) inst? 'inst?)
provide (mp/provider {:registry registry})]
(p/bench (provide samples)))))
[metosin/malli "0.7.0-20211031.202317-3"]
has the perf fixes in.
Hi there,
I'm using Malli
0.0.1-SNAPSHOT
.I have 15 samples of approximately 200KB each (when saved as EDN to disk). Malli is unable to provide a schema for those - it seems to run forever. It would probably have stopped at some point, but I lost my patience.
I tried to run the provider for just one of those, it actually finished, but it took several minutes on my i7 w/6 physical cores (don't know if you have any parallelization yet).
Would it be possible in some way to speed this up, or at least provide some peek into the progress (via logging, async or similar)?
Looks promising though, will use Malli for smaller stuff for now.
Best regards, Henrik