Closed sid-kap closed 8 years ago
Right. I agree that if we try to read a length-15 vector, and we see at least 15 floats, then we should read the 15 floats and store them in a R 15
.
The problem is that, with the existing code, you could expect a R 15
, then call the get
function for Vector Double
(which could return a length-200 vector!) and then naively wrap this length-200 vector in a R 15
.
The problem with this code is that it allows us to create instances of R n
that don't have length exactly n
.
By the way, I didn't understand what you meant to say about the orphan instance for Complex
. Could you explain what you meant?
Ah I see, we're using the Binary
instance for Vector
, and working from that. In that case, the naive check might be sufficient, I think? Trusting the binary might be 'normal' for other situations in the library.
And ah, I think I accidentally dropped a word, i meant that there might be potential problems/issues with an orphan instance for Complex
. People or other libraries might have their own or different instances, and so importing any module that imports Internal.Static would cause ghc to complain, since you can't hide instances on import.
The reason I was concerned is that there should never be an instance of R n
that doesn't have length exactly n
. That's why create :: Vector Double -> Maybe (R n)
, for example, returns a Maybe
, and why vector :: [Double] -> R n
throws an error if the list is the wrong length.
That's a good point about the complex instance! What do you think is a good way to handle this? Drop the Binary instances for Complex? (I personally wasn't using complex matrices, but I wanted to add it for completeness.) Or maybe implement Binary
for complex vectors/matrices without making an actual Binary
instance for complex?
Yeah, a part of the contract of having an R n
is that it has exactly n
items; it should be "impossible" to get an improper R n
without using unsafe methods, at which point I guess you can say everything is off the table anyway.
The normal way to handle something like the orphan instance here I think is to just implement Binary
for the complex vector/matrix itself without relying on a Binary
instance for Complex
. Alternatively in some cases people to to wrap Complex
in a newtype wrapper and give the instance on the wrapper, but I think maybe just writing the serialization directly would be a bit simpler here?
Sure, I'll try to write the serialization for Complex
directly instead of making an instance.
By the way, it seems that the binary package doesn't encode Doubles efficiently: https://www.reddit.com/r/haskell/comments/44w15q/psa_if_youre_serializing_floating_point_numbers/ I thought this might be somewhat relevant here, before we add an even greater dependence of hmatrix on the binary library.
Thanks for merging! I just wanted to note that this work isn't really finished yet, for a few reasons:
Complex
that I added hereR n
or L m n
that don't have the dimensions that they claim to have. Thanks for your work! I know it is a work in progress. I will make the release when it is finished.
The binary folks merged this PR (https://github.com/kolmodin/binary/pull/110) to add a Complex instance. So now we could get rid of the orphan instance. We have a few options:
1) Get rid of the orphan instance here, and restrict the binary dependency of hmatrix to 0.8.2.1 or higher, so that the complex instance is there. 2) Use ifdefs to decide, based on whether the binary version is >= 0.8.2.1, whether the complex binary instance is there. If it's not there, patch in the complex binary instance. 3) Use ifdefs to determine whether the complex binary instance is there. If not, don't make Binary instances for Complex vectors and matrices.
It seems to me that even binary-0.8.2.1 has very few dependencies, and should work with even very old versions of GHC. So I am leaning toward option (1).
I like option (1).
Okay, I will send a pull request that removes the Complex Binary instance and adds the dependency. Thanks!
This is tricky --- it seems like binary is by default in GHC, and specifying a different version of binary is a bit tricky because, at least for people using stack, stackage is still on binary-0.7.5.0 (which I assume is the version of binary that shipped with ghc-7.10.3, which is version of ghc that the latest Stack snapshot uses.
This is getting tricky---I'd be happy to just remove all the complex binary instances, since I was not using those anyways. We can maybe revisit this once the Complex Binary
instance gets released with ghc.
@albertoruiz, what are your thoughts on this?
A possible temporary workaround could be to serialize a tuple with the real and imaginary parts. But if there is no urgent need we can wait until the Complex Binary instance is easily available in stack.
hm, I feel like the sensible design choice here would be to just parse the binary data as if it were encoding the matrix of the dimension the type suggests. This would be similar to the behavior of, say, the instances for
(Int,Int,Int)
vs.(Int,Int)
. It's not like, when deserializing an(Int,Int)
, there's a check to see "how manyInt
s there are" in the binary data. If you try to decode an(Int, Int)
, it'll just attempt to decode an(Int, Int)
from the binary data...it doesn't care if an(Int, Int, Int)
or an(Int, Int, Int, Int)
was initially encoded. If the data is unparsable, then that's when the error would come up. Though perhaps not the "safest" route, it seems the most consistent with the philosophy of the rest of binary.and just as a note, there might be some potential with an orphan instance for
Complex
:o