koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 153 forks source link

Incorrect C generated for variant constructors with different "scan counts" #295

Closed derrickturk closed 1 year ago

derrickturk commented 1 year ago

First, an apology: I've tried to reduce this example to the minimum which triggers the behavior, but it could probably be shrunk further.

I've got a program:

pub type dst
  Mem(ptr: int)
  Rel(offset: int)

pub fun show(dst: dst): string
  match dst
    Mem(ptr) -> "Mem(" ++ ptr.show ++ ")"
    Rel(offset) -> "Rel(" ++ offset.show ++ ")"

pub type src
  Dst(dst: dst)
  Imm(num: int)

pub fun show(src: src): string
  match src
    Dst(dst) -> "Dst(" ++ dst.show ++ ")"
    Imm(num) -> "Imm(" ++ num.show ++ ")"

pub type instruction
  Out(src: src)
  Jz(cmp: src, tgt: src)

pub fun show(instr: instruction): string
  match instr
    Out(src) -> "Out(" ++ src.show ++ ")"
    Jz(cmp, tgt) ->
      "Jz(" ++ cmp.show ++ ", " ++ tgt.show ++ ")"

pub val busted = Out(Dst(Mem(224)))

pub fun main()
  println(busted.show)

which produces the unexpected output:

Out(Dst(Mem(-1)))

I've tracked the source of the -1 to the generated C for the Out constructor:

struct kk_broken_instruction_s {
  kk_value_tag_t _tag;
  union {
    struct kk_broken_Out Out;
    struct kk_broken_Jz Jz;
    kk_box_t _fields[4];
  } _cons;
};
typedef struct kk_broken_instruction_s kk_broken__instruction;
static inline kk_broken__instruction kk_broken__new_Out(kk_broken__src src, kk_context_t* _ctx) {
  kk_broken__instruction _con;
  _con._tag = kk_value_tag(1);
  _con._cons.Out.src = src;
  _con._cons._fields[2] = kk_box_null;
  _con._cons._fields[3] = kk_box_null;
  return _con;
}

The _fields array is too short to account for the Jz constructor (two src values, each with a tag word and the potential to store a dst value, which in turn has a tag word and a data word), if I understand the memory management scheme correctly, and the generated code for an Out overwrites the data fields of the nested dst rather than the intended un-used fields (from the additional payload of the Jz constructor).

Two workarounds avoid this particular failure - either removing the Jz constructor or padding the Out constructor with an extra () member. However, after poking around the C backend a bit, and reviewing the generated code with these workarounds, I think something is awry with the calculated "max scan count" for types like this.

derrickturk commented 1 year ago

It looks like the changes in ba9433f fixed this; the example above now produces the expected output:

Out(Dst(Mem(224)))