if lhs of one stage is the exact same data frame as rhs of the prior step, don't output both to json

pgbovine commented 2 years ago

Right now in my trace canonicalization script, I remove this redundancy when possible, but ideally it would be removed earlier in the tracer itself. That's because my hunch is that you can do a == (pointer equals) operation to compare rhs of one side to lhs of the other side, since they're the same data frame. If they're indeed the same data frame, then there's no point in writing it out twice to JSON, which doubles the space usage (and possibly memory usage too because JSON encoding can be memory-intensive).

pgbovine commented 2 years ago

the way i'd encode it is something like

  lhs: "prev_rhs"

pgbovine commented 2 years ago

      "tables": {
        "lhs": "prev_rhs",
        "rhs": {
          "col_names": [
            "carb",
            "optden"
          ],

seankross / mario

if lhs of one stage is the exact same data frame as rhs of the prior step, don't output both to json #28