Closed guicho271828 closed 8 years ago
The very important observation about the pattern matching optimizer is that sharing a data constructor among pattern matching clauses of the same data constructor can reduce the number of checks whether or not a value being matched has the data constructor.
Introducing a primitive (in imperative sense) or more general ingredient will make pattern language more "powerfull", but at the same time will make the optimizer lose the chances of optimization. Additionally, we should keep good algebraic properties of pattern language as much as possible so that the optimizer can utilizes these properties. That is, the optimizer is not optional (in a sense), rather is based on the design of pattern language.
So, I am skeptical about this proposal. Refactoring a code and improving a completeness of pattern language are welcome, of course.
See the following paper for details.
Thank you for the comment, I read through the paper while implementing optima-clone (last year, maybe?), and understand the background well.
This proposal is NOT trying to make the optimizer optional. Although layer-1 matcher does not run any optimization, and handles the macro expansion only, Layer-1 matcher is not meant to be used by the normal user, and it is rather a utility that only the layer-2 implementers can benefit from. Layer-1 can parse canonical or/guard pattern only, that is, just like Ordered BDD, the order of checking a variable is the same for all pattern, no duplicate checking is involved, etc, and writing on layer-1 matcher is painful for the normal user. However, thanks to these constraints, the code of layer-1 is very clean, and the expansion would also be clean.
In contrast, Layer 2 optimizes and compiles layer-2 match clause into layer-1 match clause. While implementing Layer-2, it is no problem to use several specialized structures to introduce some smart optimization. Consider the following example:
(l2:match x
((cons _ (cons b _)) b)
((cons 2 :a) :a))
; --> macroexpand
(l1:match x
((guard it (consp it) (car it) #:tmp1 (cdr it) #:tmp2)
(l2:match* (#:tmp1 #:tmp2)
((_ (cons b _)) b)
((2 :a) :a))))
This is not a working example yet, but depicts the basic intention.
I already implemented layer-0 and layer-1 last night. see https://github.com/guicho271828/optima/tree/v2
From level1/package.lisp:
;; Level-1 `match' accepts or/guard patterns only.
;; syntax:
;; (or subpattens*)
;; (guard symbol test-form {generator-form subpattern}*)
;; NOTE: There are several restrictions in the input of level-1 pattern
;; match. Level-1 patterns are canonical. That is, there are no
;; forward/backward-referenced symbols, and all subpatterns of or-pattern share
;; the same set of variables.
;; Also, level-1 guard patterns do not allow subpatterns in `symbol'.
;; 1 guard pattern corresponds to exactly 1 type checking.
;; (note: the task of the optimizer is to minimize the number of checking).
;; thus, compilation of level-1 `match' is equivalent to just building a
;; form consisting of `if' and `let' binding. level-1 `match' assumes the tree is
;; already valid and optimized.
BTW, from Fabrice et al. paper, it is unclear whether their method is guaranteed to be optimal. Although they show some experimental results and the result code seems somewhat optimized, I got the feeling that the methods are ad-hoc, and the paper lacks theoretical analysis on the resulting output. From the standpoint as an researcher of combinatorial optimization, the paper lacks in particular:
c*x2
, where c*
is the optimal cost")In contrast, https://hal.inria.fr/inria-00001127/ seems, from the abstract, somewhat clearer to implement, and at least providing the more recent results, as well as theoretical analysis.
current impl of level-2 and/guard pattern compiled into level1 guard (guard1) pattern, FYI. match0 is a level0 matcher.
(defpattern and (&rest subpatterns)
(with-gensyms (it)
(match0 subpatterns
((list) `(guard1 ,it t)) ;; well, there is no _ pattern in level1
((list sp) sp)
((list* sp and-subpatterns)
(match0 (pattern-expand sp) ;; level1 patterns
((list* 'guard1 sym test guard1-subpatterns)
`(guard1 ,sym ,test ,@guard1-subpatterns
,sym (and ,@and-subpatterns)))
((list* 'or1 or-subpatterns)
(list* 'or1
(mapcar (lambda (or-sp)
`(and ,or-sp ,@and-subpatterns))
or-subpatterns))))))))
(defpattern guard (subpattern test-form)
(with-gensyms (it)
`(and ,subpattern
(guard1 ,it ,test-form))))
implemented form-based type inference. though not complete, it is a sound algorithm, as long as each inferer is sound. Adding an inferer can improve the range of forms covered.
https://github.com/guicho271828/optima/blob/v2/level2/inference2.lisp
separated "optimizer" from the core compilation system. using this notion is beneficial in that
I understand your motivation and it looks interesting, but at the same time, I don't have much time to maintain optima except for bug fixes or small improvements. I think this should be a new project.
From a technical point of view, guard1
seems to be too "primitive" or "strong", meaning it will prevent the optimizer from exploiting algebraic properties for, say, program fusion as https://hal.inria.fr/inria-00001127/ noted.
As for type inference, I am very skeptical. There are many reasons.
We must, first of all, recognize what the important problem is. I don't see any big problem in the current implemenation.
By the way, Haskell-style pattern guards (#112) will improve extensibility.
I understand your motivation and it looks interesting, but at the same time, I don't have much time to maintain optima except for bug fixes or small improvements. I think this should be a new project.
Agree, I lean toward renaming it into something like ... trivia, minima, primitiva, unitia ... something that resembles optima, but also imply simplicity.
Re: type parameters (== 型変数, right?) and ADT, they are outside the scope of pattern match macros. They are rather in the scope of optimizer. Instead, a promising way is to extending the type system with the ability that already exists in CL -- type specifiers.
I once made a toy product abusing the type expansion and compound/derived type specifier, but now I have a decent idea of using a class object as an atomic type specifier, which is valid, as defined in http://www.lispworks.com/documentation/HyperSpec/Body/04_bc.htm . This is different from assuming the input of the matcher as being slow CLOS instances: I mean I will extend structure-class
metaobject by augmenting the ability to handle those parameters. It will be named something like parametrized-structure-class
, by which I plan to provide efficient structure objects.
Re: ADT --- I am making several libraries to handle this problem, e.g., https://github.com/guicho271828/type-i , https://github.com/guicho271828/type-r . https://github.com/guicho271828/optima/blob/v2/level2/typerel.lisp will also be separated from v2 in some form.
I will not consider the invalidation of class hierarchy as of now. Unlike the languages of ML family, lisp way does not guarantee the type correctness. It just uses the information as optimization hint (in an unsafe manner), and I will stick to it. But of course, invalidation should be minimized. Makefile-like dependency resolving infrastructure for functions, macros or structures is needed.
by the way, currently v2 passed all 168 tests provided in v1. --- including the extra/ppcre patterns. Tests are minimally modified: no when
syntax sugar, first argument of assoc
is evaluated (rather than non-evaluated as in v1), etc. level0 and level1 design decision really helped in keeping the matcher as simple as possible.
;; NOTE: incompatibility --- this is not an association list, according to CLHS
;; (is-match '(1 (2 . 3)) (assoc 2 3))
(signals type-error
(match '(1 (2 . 3)) ((assoc 2 3) t)))
;; NOTE: incompatibility --- first argument to assoc should be quoted or constant
;; (is-match '((a . 1)) (assoc a 1))
(is-match '((a . 1)) (assoc 'a 1))
;; NOTE: incompatibility --- keyword arguments to assoc is evaluated
;; (is-match '(("a" . 1)) (assoc "A" 1 :test string-equal))
(is-match '(("a" . 1)) (assoc "A" 1 :test #'string-equal))
;; NOTE: incompatibility --- first argument to property should be quoted or constant
;; (is-match '(a 1) (property a 1))
(is-match '(a 1) (property 'a 1))
note that, I tried to hack the guard pattern in v1, but finally repelled by the codebase, only to decide developping another variant.
Also note that: I really want to implement optimizations like this:
(match x
((type (integer 0 100)) 0)
((type (integer 100 200)) 1)
((type (integer 200 300)) 2)
((type (integer 300 400)) 3))
-->
(match x
((type (integer 0 200))
(match x
((type (integer 0 100)) 0)
((type (integer 100 200)) 1)))
((type (integer 200 400))
(match x
((type (integer 200 300)) 2)
((type (integer 300 400)) 3))))
cool, right?
Agree, I lean toward renaming it into something like ... trivia, minima, primitiva, unitia ... something that resembles optima, but also imply simplicity.
I'm looking forward to it.
Re: type parameters (== 型変数, right?) and ADT, they are outside the scope of pattern match macros. They are rather in the scope of optimizer. Instead, a promising way is to extending the type system with the ability that already exists in CL -- type specifiers.
The problem is that type specifiers is useless for polymorphism. We need more sophisticated type system for this purpose.
I will not consider the invalidation of class hierarchy as of now. Unlike the languages of ML family, lisp way does not guarantee the type correctness. It just uses the information as optimization hint (in an unsafe manner), and I will stick to it. But of course, invalidation should be minimized. Makefile-like dependency resolving infrastructure for functions, macros or structures is needed.
But, you know, we can say that typing correctly (in ML fasion) is another freedom of Lisp.
now it's pretty much stable and fast enough. https://github.com/guicho271828/trivia/wiki/Benchmarking-Results
Since one of the great things about Optima is that it was basically a consolidation of all/most pattern matching libraries, I think you should really look into keeping this as a single library.
Would it be possible to merge the two projects back together? Might it be an idea to move maintenance of this library to a GitHub organization, so that maintenance isn't expected of one person? (It's also possible to give other people commit rights to a repository belonging to an account, but an organization would make the intention more explicit)
Trivia is a whole rewrite, so it is not possible to merge them together. I agree about how Optima have unified the rather diverging whole lotta libraries, and I truely respect that. With this in mind, I kept exactly the same API for Trivia.
One alternative for avoiding the divergence is to make a verification library that runs a testing for different matcher. The important thing is not the uniqueness of the implementation; it is the uniqueness of API and specification, just as ANSI CL allows multiple implementations for THE single specification.
I think Optima and Trivia can coexist in quicklisp so long as the user weighs much on the safety (correctness, statbility) of Optima. Trivia would be more active, and is faster, but may not be stable compared to Optima (even though Trivia passes the same exhaustive testing as Optima, using the same testing code).
By the way, Trivia's source is shorter than that of Optima while extending its ability, and the implementation is simpler and easy to maintain. Optima contains 1156 non-empty/non-comment lines of code with 7 files (+1 package definition file), Trivia contains 1013 lines with 5 files (+3 package definition files). Trivia supports much more specialized patterns like string, simple-string, simple-base-string, bit-vector etc.
re: type variables. How's this? https://github.com/guicho271828/trivialib.typevar
To avoid the dependency as much as possible, currently, optima is implemented in an imperative style. Given its goal is to ease functional programming, this is an irony. As a result, Core algorithm and optimization are intermixed into a large conglomerate, and how the optimization work is not visually intuitive (in a source-code basis). Extending optima also requires significant amount of effort.
In light of this, I am designing the next version of optima, which has the better source architecture satisfying the following qualifications.
multi-layered architechture.
One way to avoid this dependency vs utility problem is to include a primitive pattern matcher to bootstrap optima. In the past, I wrote https://github.com/guicho271828/optima-clone as "self-compiling optima", but that goal is not yet achieved because I wrote them in a monolithic manner -- failed to separate optimization part from bootstrapping part.
I'll divide the implementation of optima version 2 into 3 layers: layer-0, which supports
list, list*, constant
patterns only, layer-1, which supportsor
andguard
only, and layer-2, which supports all existing patterns in v1 using layer-1 and derived patterns. In detail, each layer handles the following functionality:minimal structure-based expression of patterns
We should avoid using
constructor-pattern
etc. in the current implementation. guard-pattern, which will support the extended syntax suggested in #109, would eliminate the needs for complex structure-based handling of structure-patterns, class-patterns etc.Pattern transformer.
currently, trivial patterns e.g.
(and)
, syntax sugar likewhen
, guard pattern lifting, are hard coded into the expansion algorithm.defpattern
s are also treated separately. Pattern transformer works in this level of abstraction.optimizer API
Optimizer should be separated from macro generation code. The matching tree can be seen as a decision tree, so a special external algorithm should handle this. This can be implemented as a postprocessing part of layer 2.
Depending on the optimization context (
(declare (optimize ...))
) or via some form of explicit statement, user should be able to specify the quality of the resulting decision tree. Optimizer can run a local search in a decision-tree search space, or a static optimization algorithm described in Fabrice et. al.