haskell / vector

An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework .
Other
360 stars 141 forks source link

Add strict boxed vectors #483

Closed Shimuuar closed 5 months ago

Shimuuar commented 7 months ago

This is inspired by thread on discource. We don 't have any good way to ensure that elements of boxed vector are evaluated. Namely any use of map, zip, etc will not evaluate vector's elements. And if used in iterative algorithms as in thread above it will leak memory. We don't even have standard way of evaluating all elements of vector to WHNF.

From users' PoV easiest way is to add boxed vector which is strict in its elements. It would be also most reliable. And since strictness is tied to strictness of basicWrite we have to define new data type.

As much as I hate adding (and supporting) another vector data type I think this addition is valuable enough. @lehins @Bodigrim what is your opinion?

I

lehins commented 7 months ago

I've done the same in massiv, so I am definitely in favor. Having a whole new module with duplication of all of the functions is of course pretty unfortunate. I don't have the same problem in massiv.

We could potentially take the same approach here and just define the types with instances and let the users rely on the Generic interface.

Shimuuar commented 7 months ago

I think it would be better to maintain same style for all modules. I guess we're stuck with the decision.

If GHC API has stabilized somewhat it would be possible to write tool for applying documentation changes and to generate reexports automatically

konsumlamm commented 7 months ago

This could be a separate library. Then the argument for duplicating all the functions is less strong and it wouldn't make compilation slower for everyone who doesn't need strict boxed vectors.

Shimuuar commented 7 months ago

Separate library would result in rather poor usability. User would have to find another library and that's for rather basic functionality. As an example containers duplicate API for maps: there're strict and lazy version

konsumlamm commented 7 months ago

The library could be linked from the vector documentation for better discoverability. As I showed in https://github.com/haskell/vector/issues/458#issuecomment-1510425741, redefinitions slow down compilation time significantly.

lehins commented 7 months ago

I am with @Shimuuar on this one.

I would even go so far as advocating strict boxed vector as the default one that users should reach out for, instead of the current lazy one.

Also, it would be in line with other packages, like containers, transformers,mtl`, etc.

Extra 10 seconds of compile time is a small price for having this feature, IMHO.

Mikolaj commented 7 months ago

Is it the same as included in https://hackage.haskell.org/package/strict-containers? I use these with gusto.

Shimuuar commented 7 months ago

Yes. My plan was to copy definition of lazy vector, make basicWrite strict in value being written. Then there are functions for conversion from Array. Should we force all elements in Array?

It should be essentially the same.

Mikolaj commented 7 months ago

I don't know or don't remember the details, so let me ping @infinity0.

As a user, I think I'd expect the conversion to force all elements, yes.

infinity0 commented 6 months ago

I support the proposal since it means I'll have less maintenance work in strict-containers. :)

As for implementation, my patch is against vector-0.13 and I think basicWrite didn't in exist back in that version? I found it necessary to patch both basicUnsafeReplicate and basicUnsafeWrite, if that helps.

Also I provide explicit strict versions of the G.Vector API (it results in a smaller patch for me, to duplicate more of vector) and override elemseq to be strict as well, since the default definition is not.

There may have been other things I missed, but it worked well for me back when I used it - it passed a bunch of strictness tests I wrote, and it seems @Mikolaj hasn't found any bugs so far. :)

infinity0 commented 6 months ago

More details about my patch. The results are here: https://github.com/haskellari/strict-containers/tree/master/strict-containers/src/Data/Strict/Vector

We only need to duplicate 3 files from Vector:

The patch linked above is then applied onto this. Then a bunch of stuff is re-exported as Data.Strict.Vector.

I didn't think about how vector itself might provide something like this. Maybe even duplication of 3 files is too much for you and you want to think about if there's a better way to do it. Then again, this may require messy stuff involving higher-order kinds etc, the options I could think of were not pretty.

Bodigrim commented 6 months ago

The earlier discussion: https://github.com/haskell/vector/issues/380