tweag / linear-base

Standard library for linear types in Haskell.
MIT License
336 stars 37 forks source link

Using a mutable backing store (such as Array) for V #312

Closed ekmett closed 2 years ago

ekmett commented 3 years ago

Is your feature request related to a problem? Please describe.

dupV can produce V n a values efficiently for many types.

This privileges this over other possible linear storage solutions for bulk copied data.

I was hoping to work with an API that takes a V n a and uses reflection to package it up as a Range of values you can access that shares a common backing store. Ranges themselves are mostly type level constructs, the "real" Vector Any gets held by reflection in common, thus ensuring never shall two values of type Range s i j _ will exist at the same time, letting you work on privately owned data, and two adjacent ranges with the same s can be safely re-merged.

One advantage of this over just working with V n as is that I can split work up into sections, worksteal and reassemble my answers safely if the ranges show up in the types, and the original backing store identity is captured in a region parameter.

claim    
  :: forall n a b. V n a %1
  -> (forall s. (KnownV s, VLength s ~ n) => Range s 0 n a %1 -> Ur b) %1    
  -> Ur b

merge :: forall s i j k a. Range s i j a %1 -> Range s j k a %1 -> Range s i k a
split ::: ...
declaim :: forall s i j a. KnownV s => Range s i j a %1 -> V (j - i) a

which all works great, right up until realize you can't write Linear.Functor, or lenses to the individual members of the range, because unlike an Array, the contents of V n aren't actually mutable.

Describe the solution you'd like I'd like to look under the hood of V and see something that is either a Array# or MVector, so that lenses to modify the ith member can be written, that don't require complete copies, and linear fmap can be implemented in place.

Describe alternatives you've considered One thought was to use unsafe thaw operations but there isn't a strong guarantee that the vector you have is the only one using its backing array, and freezing it on one and having another segment of it thinking it was thawed on the other would be bad from a GC perspective as I understand things. moving things to a fresh mutable vector on claim above, also seems needless and painful.

aspiwack commented 3 years ago

I do like the Range design :slightly_smiling_face: .

I think that the trade-off, here, is whether we want V n a to be a mutable data structure under the hood. Or if we want V n a to have a Movable instance (which I've noticed you haven't raised in #310 , so I assume you've chosen your side there :slightly_smiling_face: ).

I've stayed away from this design choice, because I don't really have strong arguments either way.

As one of our first users, you get to influence this choice. How sure are you that it's the direction where you prefer V to go?

ekmett commented 3 years ago

For me, dupV being efficient really matters largely on very large arrays of mutable values. Ultimately it seems like there are two masters being served here and I think you'll eventually want both an initial mutable version of V and a frozen version of it that can only be consumed en mass.

The status quo has me forced to write a brand new type, and suffer a 2x cost all the time to load it. A world where I can build something mutable (preferably with dupV) and freeze it (mostly just to declaim it as a concern for gc) makes it is free other than the cognitive load.

Ultimately I need to be able to build arrays of linear values holding arrays of linear values to build any recursive mutable data structures. The status quo renders V useless for this case.

The reason I started exploring LinearTypes in haskell was to update my transients code to use linear types on the transient form, and to get a better sense of how they "feel" to work with in general, since I've been exploring QTT in my current programming language project.

Another gaping hole in that story right now is that we have support for Arrays but not SmallArrays so far, and you're paying for cardmarking on small arrays right now, which probably eats any advantage you have.... and vector doesn't offer a SmallVector, so it seems like in the end you won't be able to sit atop it anyways. Each of these pushes me towards a mutable V. Asymptotics beat constants, then you fill in the constants. ;)

In the code over in https://github.com/ekmett/linear-haskell/tree/main/linear-primitive, I started adding support for more array types, which might address the SmallArray concern. Feel free to repurpose SmallArrays or I'd be willing to help work with just moving the full primitive suite in once its done.

(The same vector concerns above are why I'm currently freezing to Data.Primitive.Array instead of Data.Vector. I can't do the same symmetrically for any of the other array types, so I'm choosing consistency over faithfulness to your current API.)

ekmett commented 3 years ago

For completeness: Just using an Array, without the position/index backing tweaks in MVector would prevent declaim from working on fragments of a V. I'd have to use another type parameter or path-descriptor, which ensures I control the whole backing store to repackage it.

declaim :: forall s a. KnownV s => Range s 0 (VLength s) a %1 -> V (VLength s) a

Alternately another parameter on Range, which may be required one ultimately wants to support shrinkSmallMutableArray# in ranges over the SmallArray equivalent of V.

Not entirely wedded to the i j k indexing system mentioned there, other options are that split/merge puts on/takes off an L and R tag over the s parameter, and I don't have to do much if any type-level arithmetic.

splitAt :: KnownNat i => Range s p ... -> %1 (Range s (L p) ..., Range s (R p) ...)

On the plus side, easier to typecheck, no arithmetic, no Data.Constraint.Nat machinery to prove things about length.

On the minus side the provenance of how I split is visible in the types, making it hard to split in one tree-shaped way and rejoin in another when you are doing left-to-right and right-to-left passes or work-stealing.

utdemir commented 3 years ago

Another issue is, having V backed by a mutable store would require a different API compared to what we have. The reason is that; with a mutable backing store, we will need to ensure that V itself can only be used linearly (that is, having a Ur (V n a) is not possible). This is not the case right now; for example, make is of the form :: a %1-> ... %1-> V n a; so it is possible to get an unrestricted V by passing unrestricted values.

Usually we use a linear callback for this issue (eg. :: a %1-> ... %1-> (V n a %1-> Ur b) %1-> Ur b). But there are multiple ways to ensure this, that is discussed on #130.

@ekmett, do you have a preference for the interface? Or any other suggestion that improves your use case?

ekmett commented 3 years ago

@utdemir The loss of the easy make is probably the best case for the version with a mutable backing store being a separate type, which is a real shame to me, but having both a V n a and an MV n a isn't the worst option.

aspiwack commented 2 years ago

With #360 , V no longer has this privileged status. We haven't implemented it yet, but we will now be able to fill the existing mutable array type efficiently all the same. Would you agree, @ekmett, that it would solve this issue for you?

aspiwack commented 2 years ago

See #375 for this particular change. But it is also possible to do the same at any type that you would define yourself.

ekmett commented 2 years ago

@aspiwack I think the Replicator story probably handles what I need now that I'm not stuck trying to work with dupV.

aspiwack commented 2 years ago

Fantastic.

I'll consider this issue closed when #375 is then. For the sake of completeness. The release is coming very soon. Thanks for your patience.

tbagrel1 commented 2 years ago

375 is unimplementable, and has been closed with wontfix. As a result, I will now close this issue.

Feel free @ekmett to comment here or to open a new issue if there is still something we didn't address with Replicator.