Closed lehins closed 3 years ago
Any idea how to test it on BE arch?
Any idea how to test it on BE arch?
Besides spinning up an AMD server from some cloud provider, like Hetzner for instance and running tests there, I have no other ideas.
Which AMD machines are big-endian? I thought all of them are little-endian, as well as modern ARM.
@juhp I recall you raising s390-related issues at GHC bug tracker. Do you possibly have an access to a big-endian machine to test this patch?
Oh look at that. AMD is LE too. In that case I have no clue how to get hold of BE hardware :D
I can probably do a test build in the Fedora buildsystem.
You can see our latest release build here for example.
So you just want me to build this change? Get the testsuite to run looks rather tricky.
Here is a scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=74908079 (using ghc-8.10.5 and LTS 18 packages basically).
@juhp Thank you for your help. Unfortunately it is the random:spec
test suite that needs to be run in order to confirm that big/little endian compatibility works as expected.
Okay, I feared as much, perhaps I can get temporary access to a Fedora s390x instance... Otherwise in the worst case we will find out later I guess ;-)
Good news I ran the testsuite on Fedora 34 s390x and it passed:
@juhp Awesome!!! Thank you very much for verifying this PR!!!
Sorry please wait - that was the wrong log... rechecking now
I ran the testsuite on Fedora 34 s390x now correctly in your branch and I am afraid there was 1 test failure:
genByteString/ShortByteString consistency: FAIL
test/Spec.hs:118:
expected: [78,232,117,189]
but got: [189,117,232,78]
(Sorry for the false confirmation earlier)
@juhp Dammit, that is unfortunate. Thank you for rechecking it!
@lehins actually, what is our goal here? To produce the same random numbers from the same seed both on LE and BE platforms? Why it does not suffice just to produce some random numbers, but not necessarily the same?
To produce the same random numbers from the same seed both on LE and BE platforms?
yes
Why it does not suffice just to produce some random numbers, but not necessarily the same?
I don't follow.
@Bodigrim Not random numbers, but sequence of random bytes
@Bodigrim I'll describe what is going on in a little more detail:
In order to to generate a ByteString (or a ShortByteString
) we could do something like genByteStringM n g = pack <$> replicateM n (uniformM g)
However this would generate 64bits for every byte that will be used, which is extremely wasteful and inefficient.
What we do instead is generate one Word64
at a time and write into a mutable buffer until we fill it up. Writing it in BE/LE agnostic manner will ensure that generated ByteString
will be the same for all architectures for the same generator.
There is also an extra issue at the end of a ByteString as well, since we often will have a tail that is smaller than Word64
(when mod n 8 /= 0
) we need to write the first few bytes into the end of the ByteString
in the same manner across architectures as well.
So the failing test in this https://github.com/haskell/random/pull/116#issuecomment-911600622 depicts that there is a problem in the logic (or in my assumptions of how it works) somewhere and we will get bytes in different order on BE vs LE machines.
Now, all I need is to figure out how can I get hands on BE machine so I can experiment with this, I can't be constantly bugging Jens to verify if a change works or not. I suspect the problem was there prior to this PR as well, except the test was not present until now and if anyone would ever run random
on a BE machine random bytes would be still ... random, so this problem is not very well pronounced, nevertheless it is still there.
Now, all I need is to figure out how can I get hands on BE machine so I can experiment with this, I can't be constantly bugging Jens to verify if a change works or not.
It looks like this project lets you run an emulated s390x Ubuntu with QEMU + Docker:
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
$ docker run --rm -t s390x/ubuntu uname -m
s390x
https://github.com/multiarch/qemu-user-static#getting-started
That might help.
Halleluja! I am not going crazy! @curiousleo enormous thank you for this suggestion with docker. It was a bit painful to get it to work, anything complicated like cabal
or stack
will not work on that docker, because it uses up memory like crazy.
So it appears my sanity is fine and GHC is not reporting ByteOrder correctly: https://gitlab.haskell.org/ghc/ghc/-/issues/20338
@Bodigrim thank you for suggesting to avoid CPP
otherwise we would not have found that bug, but now I need to bring the CPP
back.
@juhp thank you again for helping debug this. If you don't mind I'll ask you again in a little bit to run the test suite one last time, just to be sure. For now I'll need to bring back the CPP approach first.
GHC is not reporting ByteOrder correctly
Oh, that's pretty big. Thanks for debugging it.
Damn, nice find @lehins.
It was a bit painful to get it to work, anything complicated like
cabal
orstack
will not work on that docker, because it uses up memory like crazy.
I was able to use cabal -j1
. It eats up to 16Gb RAM and is slow as hell, literally hours and hours to build dependencies, but succeeds.
:smile: As I said, painful indeed
random-1.2.0 contained a shortcut where architecture independent generation of bytes relied on
bytestring
's builder functionality, which forced us to generateShortByteString
as pinned. This PR fixes that technical debt.This is a non-breaking change.