Sources of uniqueness: the daring design debate

aspiwack commented 4 years ago

[because alliterative titles are always the best titles]

Edit: I now have a better solution outlined in https://github.com/tweag/linear-base/issues/130#issuecomment-1193835221

This is something which has been at the back of my mind for a while. And I'm not exactly sure how to approach it.

Here is the problem: when creating some piece of data which needs to be unique (either for resource handling of or mutation or what have you) there are actually many ways to create such a piece of data.

In linear-base, we use two different ones:

Delimit the data in a scope withThingy :: (Thingy #-> Unrestricted a) #-> Unrestricted a
Use a specific control monad newThingy :: M Thingy
- In this latter case, there are some restriction on the run function of the monad, if the Thingy is to stay unique
- Typical examples are the IO monad, and its cousin the RIO monad.

These are not completely independent ways to create a unique thingy, since withThingy can be turned into the monadic style with the continuation monad. But they are not the same either, if only because M need not be a continuation monad.

Sometimes one of this styles is required and the other one doesn't really work. But sometimes you may want both. Maybe I can create a linear mutable array with the pure interface, but also with IO. Note how neither is an instance of the other. So I'd basically need two distinct creation functions. Meh…

But, more importantly, there are other ways to create unique thingy. For instance, if I have any guaranteed unique piece of data, I can create a new linear mutable array, and it will remain unique. And so there starts to be a bit of an n*m problem here. Where each type of unique pieces of data need many creation functions. That's unpleasant, to say the least.

So I've been thinking that data should be able to carry some kind of token establishing that they are unique. I've been calling them Source in my head (we can arrange it so that Source is 0-width data, so that they are mostly free).

The rest is a bit hazy in my head, still. But I imagine something like the following: we have a number of ways to build uniqueness sources

newSource :: IO Source
withSource :: (Source #-> Unrestricted a) #-> Unrestricted a

Moreover sources would be Dupable, I assume (which means that they don't guarantee uniqueness, but I think that it is unavoidable).

Then we would have

fromSource :: Source #-> Thingy

(and maybe withThingy and newThingy as convenient shorthands for the appropriate composition)

To be able to create a thingy out of, say, another thingy, without creating a new scope, we could use the following

produceSource :: Thing #-> (Source, Thingy)

Implementing this is what requires Source to be Dupable (this, and whatever consume or freeze function that we would have on the type, though these only require Consumable).

By convention, unique things would always have at least a fromSource and a produceSource.

Something along these lines anyway. Thoughts, further ideas, counterproposals? I want them all!

aspiwack commented 4 years ago

The sort of essential characteristic of Source is that it's a pure member of the LinHask category. In the sense that there is no (total) inhabitant of Unrestricted Source. The rest matters less, if source is to be a tool for building uniqueness abstractions.

Now that I think of it, the objection that I wrote just 5min ago above about Dupable is a bit silly. Even it it wasn't, whenever making a library which uses uniqueness for correctness, you will need unsafe things, since your low-level implementation doesn't have uniqueness anyway. And Dupable is not incompatible with uniqueness: linear mutable arrays can be made Dupable, it'll happen by copy, rather than sharing, is all. They still have the uniquess guarantee.

facundominguez commented 4 years ago

But, more importantly, there are other ways to create unique thingy. For instance, if I have any guaranteed unique piece of data, I can create a new linear mutable array, and it will remain unique. And so there starts to be a bit of an n*m problem here. Where each type of unique pieces of data need many creation functions.

I'm failing to see what the problem is. Is there some boilerplate that you want to reduce? I think I would need an example.

aspiwack commented 4 years ago

The “problem” is that there are many contexts in which I could create a unique thing.

Say that I have been diligent, and I wrote a mutable array library with both

newArray :: Int -> Linear.IO Array
withArray :: Int -> (Array #-> Unrestricted a) #-> Unrestricted a

Let's leave aside that it's a bit silly to have to write both these function. (in my proposed scheme they are trivial instances of a more general function. Or such is the hope anyway).

But then you come along, and you create a new library with a new monad, let's call it Oy. As it happens, the Oy monad would work perfectly well for defining linear mutable array. But, then, I have no knoweldge (yet), of the Oy monad. And maybe you created it for a purpose distinct from arrays. And so there is no oyArray :: Int -> Oy Array anywhere. Then Utku comes along, and needs this oyArrayfunction. There is no safe way to define this function in this situation.

On the other hand, if we had these Source things, then when creating the monad, you would just have to ask yourself: does this monad support Source. And then, you would be supporting Array-s for free, as well as any other such type.

facundominguez commented 4 years ago

Does defining newArray :: Linear.Monad m => Int -> m Array help? If the monad is only used for uniqueness, newArray could use unsafePerformIO and not impose any particular monad at all.

hm, and then what if we want to create an array where there is not a monad? :thinking: I think I'm starting to understand this issue :)

aspiwack commented 4 years ago

That wouldn't be correct anyway, since with this definition getIdentity . newArray @Identity would have type Int -> Array, and this is wrong (it gives me no way of imposing uniqueness). Only specific monads will work.

mboes commented 4 years ago

Is there an abstraction that we can model as a type class here? A type class with meaning (e.g. via laws) is better than convention.

aspiwack commented 4 years ago

Maybe? That would actually be a helpful item, since all the convenience function would work for every source bearing type like this.

Something like

data Source
instance Dupable Source

newSource :: IO Source
withSource :: (Source #-> Unrestricted a) #-> Unrestricted a

-- Not an actual suggestion for names
class SourceBearer a where
  grab :: Source #-> a
  share :: a #-> (Source, a)

new :: SourceBearer a => IO a
new = grab <$> newSource

with :: SourceBearer a => (a #-> Unrestricted a) #-> Unrestricted a
with scope = withSource (scope . grab)

grabShare :: (SourceBearer a, SourceBearer b) => a #-> (b, a)
grabShare = share . grab

Though it's hard to give some real meaning to the type class. Besides it being useful. There probably ought to be an interaction between SourceBearer and the Dupable Source instance. Such as share . (\(consume -> (), x) -> x) = id. But I'm not sure that such laws are supremely informative.

facundominguez commented 4 years ago

class SourceBearer a where
  grab :: Source #-> a
  share :: a #-> (Source, a)

It doesn't look like grab allows to create an array as we need to also provide a size to create it, which kind of invalidates the signature of the class.

  grab :: Source #-> Int -> Array
  share :: Array #-> (Source, Array)

aspiwack commented 4 years ago

That's a fair point.

So, in order to fill the blanks, we would need an extra type family attached to the type class. And have grab take extra argument of said type.

It's a tad convoluted, though. I'm not sure it's worth the type class. I don't know.

utdemir commented 4 years ago

I had a problem today that I think is relevant to this issue.

Most operations on linear mutable vectors rely on the vector being unique, so the usual way to create a vector is to take a linear callback (alloc :: ... -> (Vector a #-> Unrestricted b) #-> Unrestricted b).

Today, I was implementing the Monoid instance for linear mutable vectors. Implementing mappend went well, since we already got two vectors, we can create a new one using the fact that we have access to some unique values.

However, I had a problem when implementing mempty :: a, because it doesn't provide a way to create a unique value. With the initial suggestion on this thread, we could have something like mempty :: Source #-> a instead. Otherwise, I need something like mempty :: forall b. (a #-> Unrestricted b) #-> b which is not pretty. Or we somehow have to accept that linear mutable vectors (at least with the current representation) are not monoids.

aspiwack commented 4 years ago

@utdemir this is an interesting enough question that we ought to make a separate issue to discuss it. This comment will act as a link between the two issues.

treeowl commented 2 years ago

While everyone may already realize this, I think it's worth mentioning the prior art in GHC. We have

newtype ST s a = ST (State# s -> (# State# s, a #))
IO ~= ST RealWorld  -- I have serious qualms about this one. See below.

Most unique things can be created within ST s for any s. A few (e.g., weak pointers) can only be created in IO. We have

runST :: (forall s. ST s a) -> a

The linear type system seems to allow us (but not force us) to drop the quantification, so as I understand it, we could have

data FakeWorld
newtype ST a = ST (State# FakeWorld %1-> (# State# FakeWorld, a #))
runST :: ST (Ur a) %1 -> Ur a

Now

withThingy :: (Thingy %1-> Ur a) %1-> Ur a
withThingy f = runST (mkThingy >>= (pure $!) . f)

So as long as we have

mkThingy# :: State# s %1-> (# State# s, Thingy #)

we can get newThingyST, newThingyIO, and withThingy in one go. In (pseudo?)code,

withAnything :: ST thing %1-> (thing %1-> Ur a) %1-> Ur a
withAnything make f = runST (make >>= (pure $!) . f

Can we have source carriers? Well, yeah, a source carrier supports

share :: thingy %1-> (# State# s, thingy #)

But is that really useful? I'm not totally convinced. ~~Any time we would want it, we could have just captured the token resulting from creating the thingy.~~ The only thing you could do with it is invoke another creation function, but you'll always be in some kind of monadic context then anyway.

The qualms: I think GHC is simply wrong to make IO ~ ST RealWorld, because ST and IO are fundamentally rather different things, have different strictness semantics, and could, in theory, be implemented in totally different ways. So I see mkThingy# as solely a convenience within the world of GHC. To put it another way, I think PrimMonad should be imagined conceptually as having one method for each primop shared between ST and IO.

treeowl commented 2 years ago

Question: is the thread-indexed ST (call it STI) still useful in a linearly typed context? I would imagine so.

newtype STI s a = STI (State# s %1-> (# State# s, a #))
runST :: (forall s. ST s a) %1-> a

That is, the result doesn't need to be unrestricted.

treeowl commented 2 years ago

Ah, I guess share probably is more useful than I'd imagined in context.

class Tok a ~ s => Stately s a where
  type Tok a :: Type
  share :: a %1-> (# State# s, a #)

realizeWith :: Stately s a => a %1-> (a %1-> STI s b) %1-> b
realizeWith a (STI f)
  | (# s, a' #) <- share a
  , (# s', b #) <- unSTI (f a') s
  , () <- consumeState# s'
  = b

treeowl commented 2 years ago

Hrmm.... What I was imagining for multiple state threads won't work quite right. GHC has

withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b

That's obviously wrong, because b could retain a reference to the pointer. Linear types can make that safe with something like

withForeignPtr :: ForeignPtr a %1-> (Ptr a -> IO (Ur b)) %1-> IO (Ur b)

But this is too rigid—b can't include anything restricted. We really only meant to exclude the pointer! With a state thread-indexed version of IO, we'd have

withForeignPtr :: ForeignPtr s a %1-> (forall t. ((IOI s %1-> IOI t) -> Ptr t a -> IOI t b) %1-> IOI s b

Suppose something in the inner action needs to create an Array s c. It can't do it "natively" in IOI t. But it can use the function it's given to "lift" actions into the outer context.

treeowl commented 2 years ago

Another option might be to make realizeWith the method instead.

 class Tok a ~ s => Stately s a where
  type Tok a :: Type
  realizeWith :: a %1-> (a %1-> STI s b) %1-> b

This maintains the balance of State# tokens. A "closed" operation must take one state token and produce it at the end. For a moment, I thought maybe this would obviate the rank-2 quantification, but it doesn't. We need to make sure not only that some state token is turned in at the end of the withForeignPtr action, but that the right one is turned in.

Jashweii commented 2 years ago

https://github.com/tweag/linear-base/issues/130#issuecomment-1095360774 The variable in ST may as well be a type level witness of the particular runST call (and hence state thread), since from the users perspective it could be instantiated to any type parameter. This is what allows you to distribute STRefs and other things tying them to the same thread, and in fact you can even hold onto them through a runST call. There is a non-ST example of the quantification trick called the key monad. The essential operations as quoted from the paper of the same name:

newKey :: KeyM s (Key s a)
testEquality :: Key s a → Key s b → Maybe (a :∼: b)
runKeyM :: (∀ s. KeyM s a) → a

s is always nominal and usually (always?) an infinite kind, really :∼: could be :~~: - this doesn't use typeable at all or fix the kind of the last key parameter. Key is probably an int and KeyM effectively State Int. You can implement ST (very inefficiently) using this. The last parameter in key is usually nominal but in the ST analogue it would be representational (you can coerce STRefs). I have actually used something like this with linear types, but keeping forall s is essential to that. If the key was linear, you could even change the type of an existing key by returning it (otherwise not safe). You don't need Ur always in these unique continuations, because you don't always need to destroy the thing at the end of the scope, but you still benefit from knowing it is a unique value of the type.

Jashweii commented 2 years ago

https://github.com/tweag/linear-base/issues/130#issuecomment-1152974825

I only just realised that not returning Ur isn't actually sound even if you don't need any clean up, since you can then duplicate the applied scope function e.g. (x = scope id), if x is evaluated once then used multiple times we lose uniqueness.

aspiwack commented 2 years ago

I want to share some further development on this issue. In that, I think that I know the solution but it requires GHC support and will take a little time to happen. But it's a convincing enough solution that I do feel comfortable calling it the solution, indeed. This solution is courtesy of the Linear Constraint paper.

Without further ado.

Let's imagine that we have linear constraints. That is I can write C %1 => T, and C is inferred, like an ordinary type-class constraint. But it is only allowed to be used linearly. For instance the following would be ill-typed because linearity isn't respected:

useC :: C %1 => Int

bad :: C  %1 => (Int, Int)
bad = (useC, useC)

This is what the paper is about (how do you formalise this, and how do you solve for linear constraints).

Now, let's imagine that we have a special Linearly constraint (also described in the paper). It comes with two built-in functions:

scoped :: (Linearly %1 => Ur a) %1 -> Ur a
io :: IO (Dict Linearly)

Now, this mirrors the Source token from earlier. So we can user Linearly instead of a token. The benefit is that Linearly is discharged automatically by GHC, I don't have to deal with this.

That is, instead of

newSource $ \token ->
  let x = Array.new token 42 in
  …

I get to write

Linearly.scoped $
  let x = Array.new 42 in
  …

It's not super spectacular with just one Array.new, though remember that the starting point, for all this, is that we want to be able to use a single scope for several arrays (or unique-pointer-like things in general).

To complete this story we need one more bit of magic. This one is a little harder, to implement. Specifically, how do we actually support writing two Array.new in the same scope? In my original post, I proposed having Source be Dupable (together with a function produceSource :: Array a %1 -> (Source, Array a), which is more manual management that we want to avoid). The solution is to make Linearly dupable as well. Though we want this to be managed by the compiler. That is, we just make it possible to write

Linearly.scoped $
  let x = Array.new 42 in
  let y = Array.new 57 in
  …

This is desugared to:

Linearly.scoped $ \token ->
  let (token1, token2) = Linearly.dup2 token in
  let x = Array.new token1 42 in
  let y = Array.new token2 57 in
  …

(implementing this in the constraint solver is a little tricky, but as a naive strategy to see that it work, we can imagine generating a Linearly.dup2 everywhere that there are two different subterms, use a corresponding Linearly.consume to remove the token when they turn out to be not needed. This is quite wasteful though, so we will have to implement this in a more economical manner, which isn't as obvious, but isn't particularly scary either)

What do we replace produceSource with? Nothing! Just add Linearly %1 => to your function if you are going to make new arrays in it. GHC will handle producing enough tokens for you.

Qqwy commented 10 months ago

EDIT: No, what I wrote here is incomplete. See bottom.

The following is already possible today, by using the uncommonly-used ImplicitParameters extension:

{-# LANGUAGE LinearTypes #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE RankNTypes #-}

module Linearly (Linearly, linearly) where

-- Linearity token type. Intentionally does not implement `Movable`. Its constructor is private.
data Linearly = Linearly

linearly :: Movable b => ((?lin :: Linearly) => () %1 -> b) %1 -> b
linearly f = let ?lin = Linearly in f ()

This allows any function which would be unsafe to use in a non-linear context, to restrict itself to contexts in which the ?lin :: Linearly implicit parameter is bound. In other words: it can only be used inside the linearly scoping function and since that restricts return types to only types implementing Movable, linear-only types cannot escape their linear context. _{(And the Linearly token itself cannot escape the context for the same reason.)}

Functions which really only are allowed to work in a linear context (AKA functions which are sound only as long as their inputs/output are not aliased) can be constrained to work only under linearly like so:

newMArray :: (?lin :: Linearly) => Int %1 -> a -> Array a
fromList :: (?lin :: Linearly) => [a] %1 -> Array a
-- etc.

Which then could be used as follows:

example list = linearly (\() -> 
      list
      & Array.fromList
      & Array.arrOMap (+40)
      & Array.toList
    )

This is not a poor man's implementation of 'linear constraints' as the implicit parameter can be happily used many times. Incidentally that is exactly why it significantly improves on the 'manual linear token producing/consuming' that was discussed in most of this thread.

I think this is sound. In other words: I believe the core problem is 'how do we prevent a function that is safe only when called in a linear context from being called in a non-linear context'. And that what I've presented here is an adequate solution to this problem (which side-steps the n*m linear constructors issue).

EDIT: On closer inspection, since the implicit parameter can be used multiple times GHC does not consider the result of e.g. fromList :: (?lin :: Linearly) => [a] %1 -> Array a to be linear. 😞

aspiwack commented 10 months ago

We could just make Linearly be a regular (Dupable but not Movable) data type. But then we'd have to manage duplication manually, which is a little bit of a pain. If we want to avoid the pain of duping ourselves, then we have to teach the compiler to dupe implicitly, which is what the dupable linear constraint story is trying to achieve. I'd be (happily!) surprised if there was a solution that doesn't involve teaching this old compiler new tricks.

tweag / linear-base

Sources of uniqueness: the daring design debate #130