mikeizbicki / subhask

Type safe interface for working in subcategories of Hask
BSD 3-Clause "New" or "Revised" License
418 stars 43 forks source link

Indexed update #55

Open Drezil opened 8 years ago

Drezil commented 8 years ago

I don't know if i understood the GHC-Internals right .. but i suppose you are to make a copy when you change a structure because of sharing and the GHC replaces it with an inline-replacement if you hold no pointers to the old structure afterwards, correct?

Otherwise this is just a plain malloc/memcpy of the vector with an update to the specific position.

Compiles and seems to work for me, but i havent run the test-suite yet.

mikeizbicki commented 8 years ago

Sorry I didn't realize this in our earlier discussion, but I think your (!~) operator is the same as consAt in https://github.com/mikeizbicki/subhask/blob/ghc-8.0/src/SubHask/Algebra.hs#L2856 There's currently no mutable version of consAt, however, and that needs implementing.

Drezil commented 8 years ago

I doubt it would be a good idea to make Vector an instance of IxConstructible with just (!~)as it forces an unneeded monoidal constraint onto the Type that would behave different that expected.

Normally a Constructible has cons and snoc for prepending/appending stuff via semigroup - but in the case of Vectors + is component-wise + so it should be something like

consAt i e v = v & i %~ (e+)
snocAt e i v = v & i %~ (+e)

so a monoidal operation on that element.. which differs from overwriting the contents of that element..

Just using (!~) there would also violate

consAt i e (consAt i (-e) v) == consAt i zero v

which follows from the semigroup and cancellative-property of vectors.

I think (%~) and (!~) are right in IxContainer, but Vector can be made an instance of IxConstructible with the definition for snocAt/consAt/singletonAt above via consAt i e v = v & i %~ (e+) et al.

mikeizbicki commented 8 years ago

The version in the ghc-8 branch has different laws for IxContainer that I simplify lots of stuff. In particular, consAt is intended to work the way you've defined (!~). At first, this seems like it might mess up the Monoid instance for things like Map, but it actually doesn't. It turns the Map type into a free vector over the set of keys. That is, corresponding keys are added to each other rather than overwritten. This is a better default behavior for two reasons: it's more useful in practice (at least for me); and the more data-structures-map-like-behavior of replacing elements can be recovered from this behavior via an appropriate monoid, but the reverse is not the case.

Sorry this is a super brief description of the motivation, but I don't have time for more detail right now.

Drezil commented 8 years ago

That may be the case for maps, but i currently just talk about Vectors (using a single unboxed malloced blob of memory). From https://github.com/mikeizbicki/subhask/blob/ghc-8.0/src/SubHask/Algebra/Vector.hs#L245-L257 it follows that zero + v - v == zero. The consAt-function in this case just operates over the i-th element (as vectors are fixed-size and that is the only sensible thing to do).

let z = zero :: UVector 3 Double -- == fromList [0,0,0]
consAt 0 5 z -- fromList [5,0,0]
snocAt 0 5 z -- fromList [5,0,0]
z & 0 !~ 5 -- fromList [5,0,0]
-- but
consAt 0 5 (consAt 0 (-5) z) -- fromList [0,0,0] - as expected that the fooAt just carries the monoid-operation through to the given element
(z & 0 !~ 5) & 0 !~ (-5) -- fromList [-5,0,0] - with a "setter" there is no "way back" on that operation as only the last action is commited; also no monoid constraint on the "values" is needed; i.e. vector does not have to be a monoid by itself - though it should.

So in short (!~) is an destructive update where the "last value on that index" wins, whereas consAt uses the monoidal properties to fuse all that values together while not "forgetting" about the old.

i understand that you can recover the behaviour of my proposed (!~) by forgetting about the mappend of the monoid and using something similar to the Alternative-Instance on Maybe from base. And that you cannot recover the generic monoidal usage from (!~). That why i propose to have (!~) in IxContainer instead of IxConstructible as you don't need the monoid-constraint to implement it. Furthermore i think that your implementation of consAt is rather consAt i e v = v & i %~ (e+) where you can use the monoid you like (and recover the behaviour of (!~) by choosing the right monoid) and still retain your property on Maps given a proper Monoid-Instance on Map (that should be in case of a free vector over the set of keys some kind of group-elements as values of that vector).

At least for Vectors as "fixed amount of random-access-memory" that would be a reasonable (and fast!) imlpementation.

mikeizbicki commented 8 years ago

Got it. I understand the distinction now, and I agree that something like (!~) makes sense to have. Thanks for the very clear explanation :)

Three more things to consider before merging:

  1. First a bikeshedding-ish issue. I've been trying to ensure all the operators would be as close as possible to operators familiar to matlab/python/R/Julia users as possible, and where that fails to use standard Haskell operators. These languages all uses something like array indexing arr[i] = ... to update an array, but Haskell can't support anything like that. The most similar operator I know of is the (//) operator in the vector package for bulk updates. I think having the ability to bulk update is important for efficiency reasons, and propose we use the same convention as the vector package. Then, (!~) can be defined in terms of (//). I'm not super comfortable with all these operators that will be unfamiliar to most numerical programmers, but I think it's okay to have them for now with the understanding that they'll likely get changed later when a better naming convention is established.
  2. For each of these operations, there should be specific laws added to the test suite.
  3. I'd prefer that these changes get added to the ghc-8 branch because I think that'll make for less work in the long run, but I'm okay with adding it to master if that's what you want to work with.

Thanks again for going through this work!

Drezil commented 8 years ago

Regarding the bulk-update i think it depends on how GHC generates Core for that use-case.

I have not enough knowledge about the internals and will read up on there generated core to answer the issue.

At the moment you can implement (//) by using (!~) repeatedly in this manner:

(//) idxs vals vec = vec & foldl1 (.) $ zipWith (!~) idxs vals

but that would depend on GHC fusing the composed function to a single in-memory-update with just 1 copying of the vector (in case you hold a reference to the old).

ATM (!~) consists of

  1. allocating needed bytes
  2. duplicating the old vector in the new region
  3. updating the value
  4. returning the new vector

and i THINK GHC can manage to optimize the allocation & duplication away if you hold no reference to the old vector after that point in your program. If this is the case than you can implement (//) in terms of (!~) by just fusing those "in-place-updates" together.

I will write a specialized (//)-version as well that does not depend on the GHC doing such optimizations, but i think it could be pretty common to use an update-syntax like

vec & (5 !~ 23) . (7 !~ 42)
-- opposed to
vec // [5,7] [23,42]
-- or
vec // [(5,23),(7,42)]

because it feels "more functional" and even allows for mixing "setting" with "modifying" (as setting is just a modify with "const val"):

vec & (5 !~ 23) . (7 !~ 42) . (0 %~ (+2))

Long story short: I will implement an optimized (//) for vectors and add that operator properly and also check if repeated calls to (!~) get fused away by the GHC and produce the same code as (//) in core.

Regarding GHC8.0: The change will be in master as well as the ghc8.0-branch as these commits (and the follwoing will) flawlessly merge with both :) You will get another PR for the 8.0-branch if it is ready.

mikeizbicki commented 8 years ago

I'd be very pleasantly surprised if GHC can do that automatically. I was imagining that (!~) would be defined in terms of (//) (you're suggesting the other way around), then you could add rewrite rules that will automatically convert

vec & (5 !~ 23) . (7 !~ 42)

into

vec // [(5,23),(7,42)]
Drezil commented 8 years ago

Solving that problem in a generic way gave me headaches for a week and i still have not solved it completely.

Basically with (monadic) Stream Fusion and Recycling you can implement (//) in terms of (!~) without performance-penalty.

Roman Leshchinskiy did great work in that field and i base my thing on "Recycle your Arrays" (http://dl.acm.org/citation.cfm?id=1506073 - depending on the jurisdiction of the country you live in you can enter the doi on sci-hub.cc to get it). He expanded his work and his knowledge went into https://hackage.haskell.org/package/vector-0.11.0.0 ..

But he did not abstract. He just implemented it for his "vector v a"-datatype with a fixed structure. I tried to abstract that into typeclasses, but i face challenges convincing the type-checker that those optimizations are generally valid. As a last resort i thought about macros/TH to generate the RULES for inlining for the datatype you actually have.

My current dev-branch is https://github.com/Drezil/subhask/blob/ghc8.0-indexed_update/src/SubHask/Algebra/Vector/RMStreams.hs#L142-L151 (with the missing rules highlighted).

I also included a test-case in https://github.com/Drezil/subhask/blob/ghc8.0-indexed_update/examples/ghc-test-linear-algebra.lhs which currently does nothing.

I instantiated the classes for UVector, but i have not changed the update-rules to use the new recycling-method yet. After i had time for that i want to make sure that the rules actually fire (-ddump-rule-firings).

But thats just for the current heads-up.