Consider removing SEXP type parameters

mboes commented 2 years ago

Is your feature request related to a problem? Please describe. An R value is a value of type SEXP s a, for some memory region s and some "form" or "type" a. This parameterization of the SEXP datatype enables us to document in type signatures what forms of expressions are allowed as inputs to R functions and what forms to expect in the output. For example, the length function expects a vector and mkString returns a string:

length :: R.IsVector a => SEXP s a -> IO CInt
mkString :: CString -> IO (SEXP V 'R.String)

However, using parametric typing to model R's typing discipline is not ideal. Restricting the form of an argument to a function is easy enough: instantiate the a parameter or restrict it using a type class constraint (like in length above). An argument of unknown form can be written SEXP s a with no type class constraint. But the result of unknown form of any function is not SEXP s a, it's SomeSEXP s.

In other words, arguments are universally quantified whereas the results are existentially quantified. This is awkward to explain to users, who are often casual Haskellers whose expertise is say in data science rather than PL. What's also awkward is that code is now littered with a lot of SomeSEXP packing and unpacking. There are furthermore two Haskell type constructors to represent R values, SEXP and SomeSEXP, only because we want to be able to classify the form of an R value at the level of types in Haskell.

Finally, a technical subtlety is that R has a special Any form, which is placeholder for other forms. There is an implied subtyping relation. We model this currently using the <= type-level predicate, but this is just as awkward as the rest.

Describe the solution you'd like A solution would be to drop the type parameters altogether, or at least the a parameter. Form checking isn't that useful in practice - we are dealing with an untyped language after all, and it's mostly there for documentation purposes. We can reintroduce form-checking for the users who want it through refinement types, instead of parametric types and ad hoc polymorphism as is currently the case.

This design makes form-checking opt-in, where currently there isn't any convenient way to opt out.

GHC does not natively support refinement types (yet), but there is support through the LiquidHaskell plugin. With refinement types, we could write mkString, length and tag as follows:

{-@ type RVector = { s : SEXP | typeOf s `in` [Char, Logical, Int, Real, ...] } @-}
{-@ type RTag = { s : SEXP | typeOf s `in` [Char, Nil] } @-}

{-@ length :: RVector -> IO CInt @-}
{-@ mkString :: CString -> IO RString @-}
{-@ tag :: SEXP -> IO RTag @-}

Open questions

If a binary in package A depends on inline-r, does the binary link against LiquidHaskell or is LiquidHaskell purely a compile-time thing?
If package A does not use LiquidHaskell, can LiquidHaskell not be a dependency of inline-r? Presumably we'd need Cabal flags here to enable or disable the dependency on LiquidHaskell.
Would there by any way to track memory regions using refinement types as well?

facundominguez commented 2 years ago

Duplicate of https://github.com/tweag/HaskellR/issues/389?

mboes commented 2 years ago

Yup. Oops!

tweag / HaskellR

Consider removing SEXP type parameters #390