Type "inference" not working in frontend

j-hui commented 3 years ago

Moving this discussion here:

On our call last week, @Rewbert and I found that the frontend was encountering some type errors when it came to "inferring" the type of integral instances. The code is here: https://github.com/Rewbert/ssm-edsl/blob/6e632e049a9332bb574f78f487f4d2a222f06922/edsl/Core.hs#L129 :

instance Num SSMExp where
  -- ...
  fromInteger i = if T.typeOf i == T.typeOf (1 :: Int)
                    then Lit TInt   $ LInt   $ fromInteger i
                    else Lit TInt64 $ LInt64 $ fromInteger i

This was trying to work around the lack of a type parameter in the SSMExp data type, and infer the appropriate SSM literal constructors to use when promoting Haskell integeral literals into the EDSL. However, typeOf only knows as much as Haskell's type system does, so in fromInteger, the parameter i appears as an Integer and nothing else. So this code would only ever hit the else branch, converting all integer literals to i64s.

We discovered this when we changed the implementation to this, while refactoring:

fromInteger i = Lit TInt64 $ LUInt8 $ fromInteger i
fromInteger i
  | T.typeOf i == T.typeOf (1 :: Int)   = Lit TInt   $ LInt   $ fromInteger i
  | T.typeOf i == T.typeOf (1 :: Int64) = Lit TInt64 $ LInt64 $ fromInteger i
  | T.typeOf i == T.typeOf (1 :: Word8) = Lit TUInt8 $ LUInt8 $ fromInteger i
  | otherwise                           = error $ show $ T.typeOf i

Doing the principled thing and matching explicitly on each expected type revealed that typeOf is just not the right thing to use here.

j-hui commented 3 years ago

@Rewbert wrote something here, which uses a phantom type in the Num instance for literals to inform what the type should be:

I also had a thought about that very weird Num instance. Perhaps it is possible to add a data Lit a wrapper in the frontend similarly to Exp a & Ref a? That way the phantom type in the Lit type could specify which the polymorphic type in fromInteger is.

Brainstorming (not checked in ghc)

type Lit a = SSMLit

instance Num (Lit Int) where -- need type synonym instances, flexible instances
  ...
  fromInteger i = LInt (fromInteger i)
  ...

instance Num (Lit Word8) where
  ...
  fromInteger i = LUInt8 (fromInteger i)
  ...

instance Num a => Num (Exp a) where
  ...
  fromInteger i = Lit (typeOf (Proxy @a)) $ fromInteger i -- need scoped type variables and type applications extensions
  ...

-- etc

There will be a lot of instances that look very much the same, but a user of the language is not supposed to fiddle with this code anyway, just reap the benefits!

I suppose this goes very much hand in hand with designing a nice frontend though, which we definitely should do soon. I think we can get a very nice interface with wrappers (for type safety) and classes (for nice looking functions).

j-hui commented 3 years ago

I've been thinking about this more. The typeless design of SSMExp might not be what we want. In particular, it means that incorrectly typed terms are possible (at compile time), since there is no exposed type parameter. This prevents the Haskell type checker from making sure that SSMExp terms are being applied to one another correctly (which seems to undermine part of the purpose of it being embedded in Haskell).

For instance, if we declare SSMExp an instance of Num, then it implies that boolean variables, and indeed anything bound to SSM variables (structs, arrays) can be combined by +.

Rewbert commented 3 years ago

The idea is that e.g SSMExp is not exposed to a user of the library, but that they would rather use the API specified in Frontend.hs. The Num instance for SSMExp can be scrapped and users would instead use the newypes Exp a and Ref a, and their accompanying instances in the frontend. It is indeed possible to construct illegal terms in these underlying types, but we can guide users of our EDSL to use only our API functions, which we make sure create valid terms.

It is possible to add more type information to the internal datatypes but it can make a lot of the other things much more complicated and tricky. E.g now a Fork is applied to [Either SSMExp Reference], but if there were type parameters associated with the underlying types, in which case Fork would now need to be applied to [Either (SSMExp a) (Reference b)] or something, you could not apply a Fork to e.g both a boolean expression and in integer expression. The a would need to specialize to either of them. Working around this would add a lot of bookeeping.

j-hui commented 3 years ago

Yeah, I was going to suggest some crazier schemes I had imagined (data SSMExp a or even class SSMExp a) but these would indeed cause headaches in the backend. In particular, I can't think of a type-safe way to maintain a map of variables to values in the interpreter. (I think Fork can actually be worked around by replacign the list with an enumeration of tuples, which works because the number of things we fork at a time is syntactically bound.)

That being said, the reason I'm interested in exposting more type information is to better support user-defined types, while leveraging the Haskell type checker as much as possible to report type errors at compile time. But perhaps this is just impossible.

Rewbert commented 3 years ago

I mean we could try to do this and see just how far we get before we run into some serious trouble. I already tried to do that to SSMExp (in a branch I unfortunately already deleted, it was on my local machine), but ran into very tricky errors in the interpreter. Perhaps if we try to do this we should make a smaller version of the language and solve it for that version, to speed up the experimenting.

At the end of the day there's going to be some terms with very simple types at the bottom. Adding a layer on top of all the untyped things and forcing users to use that layer seems like a nice compromise.

ssm-lang / Scoria

Type "inference" not working in frontend #2