tweag / inline-java

Haskell/Java interop via inline Java code in Haskell modules.
BSD 3-Clause "New" or "Revised" License
229 stars 15 forks source link

Make the singletons of arguments to foreign calls depend only on types. #133

Closed facundominguez closed 4 years ago

facundominguez commented 4 years ago

We introduce here a type class that provides the singletons of arguments as computed from their types but not their values. This makes possible caching the MethodID for multiple invocations.

mboes commented 4 years ago

To clarify my comment about error messages above: errors that say "expected X, got Y", are easier to deal with than a mess of "couldn't deduce A from context C". Regarding overlapping instances specifically, the problem with them is that they're only usable given some closed world assumption. If overlap is allowed, code that type checks now could stop type checking tomorrow because someone added a new instance. Even if we assume a closed world of instances, the fact that a tuple of arguments larger any tuples that has a specific instance would silently be treated like the single argument case (depending on what Coercible instances are in scope), is problematic.

cc @goldfirere

goldfirere commented 4 years ago

It is often possible to avoid overlapping instances with the use of a closed type family, which would discern among the different cases and return something the type-class machinery could case on. I don't know of a well-written example of this trick, but it's in the folklore and I'm happy to expand on this if you're not familiar with it. And perhaps some cleverness with TypeError and such can improve type errors. However, I think you'll always have the problem of "too big tuples".

Unfortunately, I'm lost for context here, and so I can't apply my general comments here to your specific case.

mboes commented 4 years ago

@goldfirere #135 proposes an alternative, namely to use a simple printf-like GADT. Except that in this case the format specifier is particularly simple: just a constructor specifying the number of arguments we intend to pass in. So you can have e.g.

call obj "getFoo" With2Args True 42

Here, True and 42 are arbitrary arguments to some binary method getFoo. But perhaps a closed type family would work just as well and avoid the somewhat redundant With2Args specifier. IOW, the question is, if you had to implement today a degenerate printf where the format string always says "take any given arguments and concatenate the result of calling show on each", how would you do that? Would you still use a GADT like in Xi et al's original 2003 paper, or would you now use a closed type family?

facundominguez commented 4 years ago

Wouldn't the simplest fix here be

+ newtype Single x = Single x
+
-instance {-# OVERLAPPABLE #-} Coercible x => JNIArguments x where
+instance Coercible x => JNIArguments (Single x) where
   sings _ = [SomeSing (sing :: Sing (Ty x))]
-  jvalues x = [coerce x]
+  jvalues (Single x) = [coerce x]

?

It doesn't fix the need to put the arguments in a tuple, but I don't think that

call obj "getFoo" With2Args True 42

is easier to write and read than

call obj "getFoo" (True, 42)
mboes commented 4 years ago

Right, but what is Single and a family of tuples, if not variants of an algebraic datatype? In the latter case, the set of possible constructors is closed, which indeed is what we want here. Making this a datatype furthermore improves the error messages for the reasons given above. So we end up in pretty much the same place as #135. The only remaining choices in the design space are whether to have

call :: ... -> Arity r -> r

or

data Args where
  Args1 :: Coercible x1 => x1 -> Args
  Args2 :: (Coercible x1, Coercible x2) => x1 -> x2 -> Args

call :: ... -> Args -> IO b

i.e. whether to dissociate the constructor from the arguments or not (more on that later).

facundominguez commented 4 years ago

Making this a datatype furthermore improves the error messages for the reasons given above.

I confess that I don't follow what error messages we are considering, once overlapping instances are avoided as above.

"Could not deduce instance of JNIArguments Int"? I admit that improving that could be difficult. I didn't feel it would be so bad to need improvement, or not enough bad to sacrifice the tuple syntax at least.

This is not going to work:

data Args where
  Args1 :: Coercible x1 => x1 -> Args
  Args2 :: (Coercible x1, Coercible x2) => x1 -> x2 -> Args

The value of type Args needs to be inspected in order to build the MethodID, and one such value is constructed for every invocation. Looks like the arity specification needs to be kept separate.

facundominguez commented 4 years ago

Looks like the arity specification needs to be kept separate.

Nah, scratch that. I think it can be done. Args would be used only in the user facing interface. The internal calls can do without it. It would depend on GHC optimizations, though, whether it can separate the arguments from the arity spec itself.

mboes commented 4 years ago

Looks like the arity specification needs to be kept separate [for performance reasons].

Indeed. Leaving us with #135 as the only solution that satisfies all requirements: good error messages, good performance. As you point out there, however, #135 is not complete. We need to push further down in the call graph the separation between type singletons and arguments. I will work on that this evening.

goldfirere commented 4 years ago

It looks like we know the type will end in IO blah. Then we don't need these techniques. How about something like this:

class Coercible a
instance Coercible ()
instance Coercible Int
instance Coercible Bool
instance Coercible Char

data Ty a
type family J a
type instance J (Ty a) = a
class IsReferenceType a
instance IsReferenceType (Ty Ref)

class CallResult r where
instance Coercible b => CallResult (IO b) where
instance (Coercible a1, Coercible b) => CallResult (a1 -> IO b) where
instance (Coercible a1, Coercible a2, Coercible b) => CallResult (a1 -> a2 -> IO b) where

call :: (IsReferenceType (Ty a), Coerce.Coercible a (J (Ty a)), Coercible a, CallResult r)
     => a -> String -> r
call = undefined

data Ref
instance Coercible Ref

newRef :: IO Ref
newRef = undefined

test :: IO ()
test = do
  ref <- newRef
  () <- call ref "nullary"   -- need the () <- so that GHC knows the result is Coercible
  _ :: Int <- call ref "nullary returning int"
  () <- call ref "unary" True
  () <- call ref "binary" 'x' (5 :: Int)
  return ()

No modern features! I'm worried I'm missing a key aspect of the design here, though.

facundominguez commented 4 years ago

Thanks, Richard. That could work for the Unsafe interface, but the Safe interface is using an abstract monad m that has an instance of MonadIO.

There are also the concerns about error messages involving CallResult being less pleasant than using GADTs.

goldfirere commented 4 years ago

Yes, generalizing over the monad is troublesome. But I think the error messages will be OK. The only problem would be if the programmer uses call in a context that doesn't expect an IO something.

I will think about generalizing over the monad.

goldfirere commented 4 years ago

Here's another stab:

class Coercible a
instance Coercible ()
instance Coercible Int
instance Coercible Bool
instance Coercible Char

data Ty a
type family J a
type instance J (Ty a) = a
class IsReferenceType a
instance IsReferenceType (Ty Ref)

data Nat = Zero | Succ Nat

type family NumArgs r where
  NumArgs (a -> b) = Succ (NumArgs b)
  NumArgs other    = Zero

class CallResult_ num_args r where
instance Coercible b => CallResult_ Zero (m b) where
instance (Coercible a1, Coercible b) => CallResult_ (Succ Zero) (a1 -> m b) where
instance (Coercible a1, Coercible a2, Coercible b) => CallResult_ (Succ (Succ Zero)) (a1 -> a2 -> m b) where

type CallResult r = CallResult_ (NumArgs r) r

call :: (IsReferenceType (Ty a), Coerce.Coercible a (J (Ty a)), Coercible a, CallResult r)
     => a -> String -> r
call = undefined

data Ref
instance Coercible Ref

newRef :: IO Ref
newRef = undefined

test :: IO ()
test = do
  ref <- newRef
  () <- call ref "nullary"   -- need the () <- so that GHC knows the result is Coercible
  _ :: Int <- call ref "nullary returning int"
  () <- call ref "unary" True
  () <- call ref "binary" 'x' (5 :: Int)
  return ()

The result monad is generalized over, but I needed to use a closed type family. I'm not sure about the quality of error messages here. Provided the call appears after a <- in a do block, I think it would be OK.

facundominguez commented 4 years ago

We are going to try an approach as described in #137.