Custom rules. Iteration 2

maxonfjvipon commented 3 months ago

We want to make bytecode optimizations via phi normalizer. We've developed a new format for custom rules which should be more strict than v1. Toy example of optimization: replace .java_util_Stream$map with for-each atom call.

# EO before
[list] > main
  list
  .java_util_Stream$filter
    (el.equals "").not > [el]
  .java_util_Stream$map
    java_lang_Integer.parseInt el > [el]
  .java_util_Stream$toList > result

# EO after
[list] > main
  list
  .java_util_Stream$filter > filtered
    (el.equals "").not > [el]
  QQ.opeo.for-each > mapped
    filtered
    java_lang_Integer.parseInt el > [el]
  mapped.java_util_Stream$toList > result

PHI:

# PHI before
main ↦ ⟦
  list ↦ ∅,
  result ↦ ξ.list.java_util_Stream$filter(
    α0 ↦ ⟦
      el ↦ ∅,
      φ ↦ ξ.el.equals(
        α0 ↦ Φ.org.eolang.string(...)
      ).not
    ⟧
  ).java_util_Stream$map(
    α0 ↦ ⟦
      el ↦ ∅,
      φ ↦ Φ.java_util_Integer.parseInt(
        α0 ↦ ξ.el
      )
    ⟧
  ).java_util_Stream$toList
⟧

# PHI after
main ↦ ⟦
  list ↦ ∅,
  filtered ↦ ξ.list.java_util_Stream$filter(
    α0 ↦ ⟦
      el ↦ ∅,
      φ ↦ ξ.el.equals(
        α0 ↦ Φ.org.eolang.string(...)
      ).not
    ⟧
  ),
  mapped ↦ Φ.org.eolang.opeo.for-each(
    α0 ↦ ξ.filtered,
    α1 ↦ Φ.java_util_Integer.parseInt(
      α0 ↦ ξ.el
    )
  ),
  result ↦ ξ.mapped.java_util_Stream$toList
⟧

maxonfjvipon commented 3 months ago

DSL specification:

We consider usage of variables and literals on both sides of bidning: left-side ↦ right-side.

{i} - index >= 0

RESERVED LITERALS:

α{i} - Alpha binding name (if on the left side) or attribute name (if on the right side)
' * ' - before tail sign (with two spaces, no quotes), separator between head and tail, see phi paper, section Head and Tail.
$ - current τ in conditions (see below)

VARIABLES. Start with $ sign

$τ{i} - one title, no composite ones (int, plus, eolang, etc), is used of both sides
$t{i} - tail (see phi paper, section Head and Tail), possibly empty, starts from . (.plus, .x.y.z, .plus(α ↦ x))
$B{i} - pairs/bindings
$b{i} - object, used only on right side, can be a chain of dispatches/applications, can ONLY start the sequence

FUNCTIONS or - logical OR and - logical AND not - logical NOT matches(text, regexp) - TRUE if text matches regexp equals(text, text) - TRUE if same value unique(τ, B) - TRUE if τ is unique in B ...

YAML FORMAT:

// describe all used variables in 'patterns' section with required conditions.
// example of condition: $τ5: matches($, "java_util_Stream$map")
// "For all variables ... where ... and $τ5 matches to "java_util_Stream$map" and ..."
forall:
  $τ0:
  $τ1:
  $τ5: matches($, "java_util_Stream$map")
  $b1:
  $b2:
  $B1:
  $t1:

// "For all variables ... which match the patterns..."
patterns:
  - |
    $τ0 ↦ ⟦
      $τ1 ↦ $b1.$τ5(
        α0 ↦ $b2
      ) * $t1,
      $B1
    ⟧

// describe all used variables in 'results' section with required conditions.
// conditions in such section may use variables from 'forall' section
// "For all variables ... which match the pattern ... such variables exist ..."
exists:
  $τ2: and(unique($, $B1), not(equal($, $τ1)))
  $τ3: and(unique($, $B1), not(equal($, $τ1)), not(equal($, $τ2)))

// describe result after transformation
// "For all variables ... which match the pattern ... such variables exist ... so result patterns may be applied"
results:
  - |
    $τ0 ↦ ⟦
      $τ2 ↦ $b1,
      $τ3 ↦ QQ.opeo.map-for-each(
        α0 ↦ ξ.$τ2,
        α1 ↦ $b2
      ),
      $τ1 ↦ ξ.$τ3 * $t1,
      $B1
    ⟧

maxonfjvipon commented 3 months ago

@deemp @fizruk please check

fizruk commented 3 months ago

A few comments on the suggested format:

The star notation head * $tail is meant to match with head somewhere inside a larger term, buried under some dispatches and applications. Importantly, we do not allow going inside formations. However, this notation misses an important case where head is inside of an argument in an application. For instance, consider matching head in the following term: t1.a.b(c ↦ t2.d.e(f ↦ head.g.h).i.j). For this reason, a one-hole context notation like C[head] might be preferred. We could also use head * $Context, but I think the point of * was that it was followed by a list of dispatches and applications, while context is more complex than that.
We already have a mechanism for conditions. I think we should stick to a single style for conditions.
forall is automatic in the normalizer currently. Is there a reason to make it explicit? I think it might be useful to catch typos, but I do not see it as a crucial component of the rule definition.
It is unclear how patterns and results are related. Do we replace $i$th pattern with $i$th result? Note that using the attribute name as an anchor is unreliable, since the same attribute can be used in multiple places with completely different purpose.

objectionary / normalizer

Custom rules. Iteration 2 #423