Module Rewriting is Unsound With Atoms

Currently, module translation occurs before typechecking. That is, we stitch the whole program into a single set of clauses, and use name mangling to keep modules logically distinct. The problem with this strategy is revealed with the following example:

datadef(foo, [], [foo]).

clausedef(someAtom, [], [atom]).
someAtom(foo). % unrelated to previous definition of the `foo` data

Because module translation knows nothing about types, it will end up converting this program to something like the following:

datadef(public_1_foo, [], [public_1_foo]).

clausedef(public_1_someAtom, [], [atom]).
public_1_someAtom(public_1_foo).

That is, the atom used in someAtom gets name mangled incorrectly. This, in practice, ends up being diabolical, as it's completely silent. Note that the program still typechecks - public_1_foo is, in fact, an atom, just not the one we had originally. This has lead to programs working incorrectly at runtime without a clear reason as to why.

There is a solution to this problem which allows us to keep module translation separate from typechecking, though it's a bit painful. Currently, our typesystem nondeterministically treats functors with zero arity as atoms, which is convenient but odd. If we were to explicitly introduce a built-in constructor for atoms, not only would this allow the typesystem to be deterministic, it would fix this problem in module translation. The translation would see the special constructor, and it could immediately back off as opposed to name mangling it. Actually implementing this idea wouldn't take a whole lot of time, but it's a breaking change for existing code.

kyledewey / typed-prolog

Module Rewriting is Unsound With Atoms #9