Position encoding of edits

The model is currently poor at decoding the position of edits (versus the type and character, e.g. edit = insert 'c' at 5 sig type * char * pos). This may be because:

There is a bug in the python model.
Absolute position encoding is bad, and should use "more advanced decoding" like rotational encoding instead.
There is a bug in ocaml batch generation.
Training isn't long enough, or the model is too small.
Programs have inherent invariances, which lead to ambiguity and training noise: order in addition doesn't matter, e.g.

To which I think:

Definite possibility, need a positive control dataset?
Also probably true, but want to punt on this
Unlikely, as inspected via plot_mmap.py
Also unlikely, the model has in the past memorized the datasets. See 'positive control' above.
Very likely culprit, suggest http://arxiv.org/abs/1802.03685 as an demo of how to deal with intrinsic invariances. (also pertinent: https://arxiv.org/abs/1711.08028)

Super curious to others thoughts on this. My instinct is to turn the AST (or any graph) into a list of addresses, then use a transformer to encode this into positions to be fed to a larger, orthogonal transformer.

Basically: programs are graphs (or at minimum trees), so operating on them as lists is dumb, and i think we're already running into these limits.

Imagine that this has been described in the literature, but I'm not aware of anything?

tlh24 / cortex

Position encoding of edits #3