haskell-numerics / hmatrix

Linear algebra and numerical computation
381 stars 104 forks source link

Add binary instances for static matrix and vector #178

Closed sid-kap closed 8 years ago

mstksg commented 8 years ago

hm, I feel like the sensible design choice here would be to just parse the binary data as if it were encoding the matrix of the dimension the type suggests. This would be similar to the behavior of, say, the instances for (Int,Int,Int) vs. (Int,Int). It's not like, when deserializing an (Int,Int), there's a check to see "how many Ints there are" in the binary data. If you try to decode an (Int, Int), it'll just attempt to decode an (Int, Int) from the binary data...it doesn't care if an (Int, Int, Int) or an (Int, Int, Int, Int) was initially encoded. If the data is unparsable, then that's when the error would come up. Though perhaps not the "safest" route, it seems the most consistent with the philosophy of the rest of binary.

and just as a note, there might be some potential with an orphan instance for Complex :o

sid-kap commented 8 years ago

Right. I agree that if we try to read a length-15 vector, and we see at least 15 floats, then we should read the 15 floats and store them in a R 15.

The problem is that, with the existing code, you could expect a R 15, then call the get function for Vector Double (which could return a length-200 vector!) and then naively wrap this length-200 vector in a R 15.

The problem with this code is that it allows us to create instances of R n that don't have length exactly n.

By the way, I didn't understand what you meant to say about the orphan instance for Complex. Could you explain what you meant?

mstksg commented 8 years ago

Ah I see, we're using the Binary instance for Vector, and working from that. In that case, the naive check might be sufficient, I think? Trusting the binary might be 'normal' for other situations in the library.

And ah, I think I accidentally dropped a word, i meant that there might be potential problems/issues with an orphan instance for Complex. People or other libraries might have their own or different instances, and so importing any module that imports Internal.Static would cause ghc to complain, since you can't hide instances on import.

sid-kap commented 8 years ago

The reason I was concerned is that there should never be an instance of R n that doesn't have length exactly n. That's why create :: Vector Double -> Maybe (R n), for example, returns a Maybe, and why vector :: [Double] -> R n throws an error if the list is the wrong length.

That's a good point about the complex instance! What do you think is a good way to handle this? Drop the Binary instances for Complex? (I personally wasn't using complex matrices, but I wanted to add it for completeness.) Or maybe implement Binary for complex vectors/matrices without making an actual Binary instance for complex?

mstksg commented 8 years ago

Yeah, a part of the contract of having an R n is that it has exactly n items; it should be "impossible" to get an improper R n without using unsafe methods, at which point I guess you can say everything is off the table anyway.

The normal way to handle something like the orphan instance here I think is to just implement Binary for the complex vector/matrix itself without relying on a Binary instance for Complex. Alternatively in some cases people to to wrap Complex in a newtype wrapper and give the instance on the wrapper, but I think maybe just writing the serialization directly would be a bit simpler here?

sid-kap commented 8 years ago

Sure, I'll try to write the serialization for Complex directly instead of making an instance.

sid-kap commented 8 years ago

By the way, it seems that the binary package doesn't encode Doubles efficiently: https://www.reddit.com/r/haskell/comments/44w15q/psa_if_youre_serializing_floating_point_numbers/ I thought this might be somewhat relevant here, before we add an even greater dependence of hmatrix on the binary library.

sid-kap commented 8 years ago

Thanks for merging! I just wanted to note that this work isn't really finished yet, for a few reasons:

  1. The Binary folks agreed to add a Complex instance to the Binary package (see https://github.com/kolmodin/binary/issues/109 and https://github.com/kolmodin/binary/pull/110), so we will be able to import that and remove the orphan Binary instance for Complex that I added here
  2. We should, before the next hmatrix release, make sure that the size check is done properly, so that we don't end up with data of type R n or L m n that don't have the dimensions that they claim to have.
albertoruiz commented 8 years ago

Thanks for your work! I know it is a work in progress. I will make the release when it is finished.

sid-kap commented 8 years ago

The binary folks merged this PR (https://github.com/kolmodin/binary/pull/110) to add a Complex instance. So now we could get rid of the orphan instance. We have a few options:

1) Get rid of the orphan instance here, and restrict the binary dependency of hmatrix to 0.8.2.1 or higher, so that the complex instance is there. 2) Use ifdefs to decide, based on whether the binary version is >= 0.8.2.1, whether the complex binary instance is there. If it's not there, patch in the complex binary instance. 3) Use ifdefs to determine whether the complex binary instance is there. If not, don't make Binary instances for Complex vectors and matrices.

It seems to me that even binary-0.8.2.1 has very few dependencies, and should work with even very old versions of GHC. So I am leaning toward option (1).

albertoruiz commented 8 years ago

I like option (1).

sid-kap commented 8 years ago

Okay, I will send a pull request that removes the Complex Binary instance and adds the dependency. Thanks!

sid-kap commented 8 years ago

This is tricky --- it seems like binary is by default in GHC, and specifying a different version of binary is a bit tricky because, at least for people using stack, stackage is still on binary-0.7.5.0 (which I assume is the version of binary that shipped with ghc-7.10.3, which is version of ghc that the latest Stack snapshot uses.

This is getting tricky---I'd be happy to just remove all the complex binary instances, since I was not using those anyways. We can maybe revisit this once the Complex Binary instance gets released with ghc.

sid-kap commented 8 years ago

@albertoruiz, what are your thoughts on this?

albertoruiz commented 8 years ago

A possible temporary workaround could be to serialize a tuple with the real and imaginary parts. But if there is no urgent need we can wait until the Complex Binary instance is easily available in stack.