koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 153 forks source link

How to handle multiple reuse patterns in a match arm? #303

Closed SchrodingerZhu closed 6 months ago

SchrodingerZhu commented 1 year ago

In the reuse analysis, if within a match arm (imaging balancing a redblack tree where we have multiple level patterns) contains a lot of reusable memory blocks with same layout, what will the koka compiler do? Will it be able to figure out the best way to assign reuse token so that the changed fields can be minimized? (I am not sure if it is optimal condition since there are side effects of dup/drop operations)

anfelor commented 1 year ago

Will it be able to figure out the best way to assign reuse token so that the changed fields can be minimized?

Ideally, that would be the case. For now, the Koka compiler just greedily picks a reuse token that has the same size and preferably the same tag as the current allocation. See: https://github.com/koka-lang/koka/blob/master/src/Backend/C/ParcReuse.hs#L280

Will it be able to figure out the best way to assign reuse token so that the changed fields can be minimized?

I think that is a very difficult thing to optimize. For example, we also want to maximize the probability that reuse can succeed in the first place. Take:

match xs
  Cons(x, _) ->
    val ys = if x > 0 then Cons(x, Nil) else Nil
    (x, ys)

Here we have two allocations: One for the Cons and one for the pair. Both could reuse the Cons cell of xs. However, the Cons allocation will only happen sometimes, while the pair allocation will always happen. As such, we should prefer to reuse at the pair allocation -- but the current Koka compiler prefers the first allocation it encounters, which is the Cons allocation. Unfortunately, to do all of this optimally seems to be NP-complete, as https://www2.cs.arizona.edu/~debray/Publications/ctgc.pdf show. I think it would be very nice to have a reasonable heuristic for this in Koka, but I am also not sure how much of a performance benefit it would bring.

SchrodingerZhu commented 1 year ago

I am considering whether we can do some super-optimization trick with egg. I cannot say it will be useful in practice, but it can be interesting to express Perceus and Drop-Guided Reuse in rewrite rules:

use egg::{Rewrite, SymbolLang, rewrite as rw, Language, CostFunction, Id, LpCostFunction, EGraph};

fn rewrite_rules() -> Vec<Rewrite<SymbolLang, ()>> {
    return vec![
        // (free ?variable ?layout ?continuation)
        // (drop ?variable ?layout ?continuation)
        // (dup ?variable ?continuation)
        // (construct ?constructor ?layout)
        // (construct-at ?variable ?constructor ?layout)
        rw!("drop-drop-exchange"; "(drop ?a ?la (drop ?b ?lb ?c))" => "(drop ?b ?lb (drop ?a ?la ?c))"),
        rw!("drop-dup-exchange"; "(drop ?a ?la (dup ?b ?c))" => "(dup ?b (drop ?a ?la ?c))"),
        rw!("drop-free-exchange"; "(drop ?a ?la (free ?b ?lb ?c))" => "(free ?b ?lb (drop ?a ?la ?c))"),
        rw!("drop-dup-cancel"; "(drop ?x ?lx (dup ?x ?y))" => "?y"),
        rw!("drop-reuse-rewrite"; "(drop ?x ?lx ?y)" => "(let ?x (drop-for-reuse ?x) (free ?x ?lx ?y))"),
        rw!("drop-let-binder-permeate"; "(drop ?a ?la (let ?b ?c ?d))" => "(let ?b (drop ?a ?la ?c) ?d)"),
        rw!("drop-let-body-permeate"; "(drop ?a ?la (let ?b ?c ?d))" => "(let ?b ?c (drop ?a ?la ?d))"),
        rw!("drop-condition-permeate"; "(drop ?a ?la (if ?c ?d ?e))" => "(if (drop ?a ?la ?c) ?d ?e)"),
        rw!("drop-branch-permeate"; "(drop ?a ?la (if ?c ?d ?e))" => "(if ?c (drop ?a ?la ?d) (drop ?a ?la ?e))"),

        rw!("dup-dup-exchange"; "(dup ?a (dup ?b ?c))" => "(dup ?b (dup ?a ?c))"),
        rw!("dup-drop-exchange"; "(dup ?a (drop ?b ?lb ?c))" => "(drop ?b ?lb (dup ?a ?c))"),
        rw!("dup-free-exchange"; "(dup ?a (free ?b ?lb ?c))" => "(free ?b ?lb (dup ?a ?c))"),
        rw!("dup-drop-cancel"; "(dup ?x (drop ?x ?lx ?y))" => "?y"),
        rw!("dup-let-binder-permeate"; "(dup ?a (let ?b ?c ?d))" => "(let ?b (dup ?a ?c) ?d)"),
        rw!("dup-let-body-permeate"; "(dup ?a (let ?b ?c ?d))" => "(let ?b ?c (dup ?a ?d))"),
        rw!("dup-condition-permeate"; "(dup ?a (if ?c ?d ?e))" => "(if (dup ?a ?c) ?d ?e)"),
        rw!("dup-branch-permeate"; "(dup ?a (if ?c ?d ?e))" => "(if ?c (dup ?a ?d) (dup ?a ?e))"),

        rw!("free-free-exchange"; "(free ?a ?la (free ?b ?lb ?c))" => "(free ?b ?lb (free ?a ?la ?c))"),
        rw!("free-dup-exchange"; "(free ?a ?la (dup ?b ?c))" => "(dup ?b (free ?a ?la ?c))"),
        rw!("free-drop-exchange"; "(free ?a ?la (drop ?b ?lb ?c))" => "(drop ?b ?lb (free ?a ?la ?c))"),
        rw!("free-let-binder-permeate"; "(free ?a ?la (let ?b ?c ?d))" => "(let ?b (free ?a ?la ?c) ?d)"),
        rw!("free-let-body-permeate"; "(free ?a ?la (let ?b ?c ?d))" => "(let ?b ?c (free ?a ?la ?d))"),
        rw!("free-condition-permeate"; "(free ?a ?la (if ?c ?d ?e))" => "(if (free ?a ?la ?c) ?d ?e)"),
        rw!("free-branch-permeate"; "(free ?a ?la (if ?c ?d ?e))" => "(if ?c (free ?a ?la ?d) (free ?a ?la ?e))"),
        rw!("free-construct-rewrite"; "(free ?var ?layout (construct ?constructor ?layout))" => "(construct-at ?var ?constructor ?layout)")
    ];
}

pub const CONSTRUCT_WEIGHT : f64 = 64.0;
pub const FREE_WEIGHT : f64 = 56.0;
pub const CONSTRUCT_AT_WEIGHT : f64 = 24.0;
pub const DROP_WEIGHT : f64 = 32.0;
pub const DROP_REUSE_WEIGHT : f64 = 8.0;
pub const DUP_WEIGHT : f64 = 4.0;

#[derive(Debug)]
pub struct Heuristic;
impl LpCostFunction<SymbolLang, ()> for Heuristic {

    fn node_cost(&mut self, egraph: &EGraph<SymbolLang, ()>, eclass: Id, enode: &SymbolLang) -> f64
    {
        match enode.op.as_str() {
            "free" => FREE_WEIGHT,
            "drop" => DROP_WEIGHT,
            "drop-for-reuse" => DROP_REUSE_WEIGHT,
            "dup" => DUP_WEIGHT,
            "construct" => CONSTRUCT_WEIGHT,
            "construct-at" => CONSTRUCT_AT_WEIGHT,
            _ => 2.0
        }
    }
}

#[cfg(test)]
mod test {
    use egg::{AstSize, Extractor, LpExtractor, Runner};

    #[test]
    fn cfg() {
        let start = r#"
        (let x (get-argument)
            (if (is-cons-cell x)
                (dup x.t
                    (dup x.h
                        (drop x.h (layout 8 8) (drop x.t (layout 16 8) (drop x (layout 16 8) (construct "Cons" (layout 16 8)))))))
                (drop x (layout 16 8) (struct (construct "Nil" (layout 16 8))))
            )
        )"#.parse().unwrap();
        let runner = Runner::default().with_expr(&start).run(&super::rewrite_rules());

        let lp_best = LpExtractor::new(&runner.egraph, super::Heuristic)
            .timeout(20.0)
            .solve(runner.roots[0]);
        println!("{}", lp_best.pretty(64));
    }
} 

This gives a program like:

(let
  x
  get-argument
  (if
    (is-cons-cell x)
    (let x (drop-for-reuse x) (construct-at x Cons (layout 16 8)))
    (drop x (layout 16 8) (struct (construct Nil (layout 16 8))))))
anfelor commented 1 year ago

Wait, in the dup-drop-exchange rule there is no constraint on a and b right? Is that rule correct?

However, I like this approach in general. I think the general idea of "pushing down" drops and "pushing up" dups is quite helpful in thinking about optimizing this. In particular, the free-construct-rewrite rule is exactly how we model reuse analysis formally (see page 16 of https://www.microsoft.com/en-us/research/uploads/prod/2021/11/flreuse-tr.pdf).

SchrodingerZhu commented 1 year ago

That may cause some problem, I am only using the default symbol rewrite rule here without considering more complicated cases.

Yeah, I get the idea of free-construct-rewrite exactly from the drop-guided reuse example.


Just to add up: To reproduce something in the original paper, one can add more rules:

// here we use layout to distinguish type; one can should use more complicated type info instead
rw!("drop-reuse-rewrite"; "(drop ?x ?lx ?y)" => "(free (drop-for-reuse ?x ?lx) ?lx ?y)"), // change the rewrite rule for drop-reuse
rw!("drop-free-merge"; "(drop ?a ?la (free ?b ?lb ?c))" => "(free (drop ?a ?la ?b) ?lb ?c)"),
rw!("drop-2-cell-expansion"; "(drop ?a (layout 16 8) ?c)" => "(if (is-unique ?a) (drop (head ?a) (layout 8 8) (drop (tail ?a) (layout 8 8) (free ?a (layout 16 8) ?c))) (decref ?a ?c))"),
rw!("reuse-2-cell-expansion"; "(drop-for-reuse ?a (layout 16 8))" => "(if (is-unique ?a) (drop (head ?a) (layout 8 8) (drop (tail ?a) (layout 8 8) ?a)) (decref ?a null))"),

This will transform sth like:

(let x (get-argument)
            (if (is-pair x)
                (dup (head x)
                    (dup (tail x)
                        (drop x (layout 16 8) (construct "Pair" (layout 16 8)))))
                (drop x (layout 16 8) (struct (construct "SomethingElse" (layout 16 8))))
            )
        )

into

(let
  x
  get-argument
  (if
    (is-pair x)
    (if
      (is-unique x)
      (construct-at x Pair (layout 16 8))
      (dup
        (tail x)
        (dup (head x) (decref x (construct Pair (layout 16 8))))))
    (drop
      x
      (layout 16 8)
      (struct (construct SomethingElse (layout 16 8))))))

Again, I am just to share the ideas and questions during my studying your great papers. Thanks very much for all these follow-ups.

TimWhiting commented 6 months ago

Should we move this to a discussion topic, or is there something actionable for Koka here? @anfelor you definitely know more about this aspect of Koka. Maybe we can make smaller issues for definite optimization opportunities.

anfelor commented 6 months ago

Yeah, this could be moved to discussions or closed since there is no performance issue here. I have since come up with a slightly better reuse matching algorithm that I hope to add to Koka in the future, but I expect the improvement would be quite minor.