well-typed / cborg

Binary serialisation in the CBOR format
https://hackage.haskell.org/package/cborg
191 stars 86 forks source link

ghc-9.4 build error on M1 #311

Open gelisam opened 1 year ago

gelisam commented 1 year ago

298 only fixed the issue on some platforms. I get the following compilation error with ghc-9.4 but not ghc-9.2, and on a M1 mac but not on a non-M1 mac. I get the compilation error with both cborg-0.2.8.0, the master branch, and the #298 branch.

$ cabal build cborg
[...]
[ 5 of 14] Compiling Codec.CBOR.Magic ( src/Codec/CBOR/Magic.hs, /Users/gelisam/working/nix/cborg/dist-newstyle/build/aarch64-osx/ghc-9.4.3/cborg-0.2.7.0/build/Codec/CBOR/Magic.o, /Users/gelisam/working/nix/cborg/dist-newstyle/build/aarch64-osx/ghc-9.4.3/cborg-0.2.7.0/build/Codec/CBOR/Magic.dyn_o )

src/Codec/CBOR/Magic.hs:260:20: error:
    • Couldn't match expected type ‘Word64#’ with actual type ‘Word#’
    • In the first argument of ‘W64#’, namely ‘(toWord w#)’
      In the expression: W64# (toWord w#)
      In an equation for ‘w64’: w64 w# = W64# (toWord w#)
    |
260 |     w64 w# = W64# (toWord w#)
    |                    ^^^^^^^^^
gelisam commented 1 year ago

This file has a lot of CPP, and the problematic code is only enabled on some machines. In particular, the problematic code was not touched by @parsonsmatt's #298 fix (other than some whitespace changes), presumably because he fixed all the lines which errored on his machine, and the problematic code was not enabled on his machine.

--
-- Machine-dependent implementation
--

-- 8-bit word case is always the same...
grabWord8 (Ptr ip#) = W8# (indexWord8OffAddr# ip# 0#)

-- ... but the remaining cases arent
#if defined(HAVE_BYTESWAP_PRIMOPS) && \
    defined(MEM_UNALIGNED_OPS) && \
   !defined(WORDS_BIGENDIAN)
-- On x86 machines with GHC 7.10, we have byteswap primitives
-- available to make this conversion very fast.

[code which was not touched by Matt's fix]

#elif defined(MEM_UNALIGNED_OPS) && \
      defined(WORDS_BIGENDIAN)
-- In some theoretical future-verse where there are unaligned memory
-- accesses on the machine, but it is also big-endian, we need to be
-- able to decode these numbers efficiently, still.

[code which was not touched by Matt's fix]

#else
-- Otherwise, we fall back to the much slower, inefficient case
-- of writing out each of the 8 bits of the output word at
-- a time.

[the problematic code, which was not touched by Matt's fix]

#endif
gelisam commented 1 year ago

Let's now look at the problematic code itself.

grabWord64 :: Ptr () -> Word64
grabWord64 (Ptr ip#) =
    case indexWord8OffAddr# ip# 0# of
     w0# ->
      case indexWord8OffAddr# ip# 1# of
       w1# ->
        case indexWord8OffAddr# ip# 2# of
         w2# ->
          case indexWord8OffAddr# ip# 3# of
           w3# ->
            case indexWord8OffAddr# ip# 4# of
             w4# ->
              case indexWord8OffAddr# ip# 5# of
               w5# ->
                case indexWord8OffAddr# ip# 6# of
                 w6# ->
                  case indexWord8OffAddr# ip# 7# of
                   w7# -> w64 w0# `unsafeShiftL` 56 .|.
                          w64 w1# `unsafeShiftL` 48 .|.
                          w64 w2# `unsafeShiftL` 40 .|.
                          w64 w3# `unsafeShiftL` 32 .|.
                          w64 w4# `unsafeShiftL` 24 .|.
                          w64 w5# `unsafeShiftL` 16 .|.
                          w64 w6# `unsafeShiftL`  8 .|.
                          w64 w7#
  where
#if MIN_VERSION_ghc_prim(0,8,0)
    toWord :: Word8# -> Word#
    toWord w# = word8ToWord# w#
#else
    toWord :: Word# -> Word#
    toWord w# = w#
#endif

#if WORD_SIZE_IN_BITS == 64
    w64 w# = W64# (toWord w#)
#else
    w64 w# = W64# (wordToWord64# (toWord w#))
#endif

grabWord64 reads a big-endian Word64 at the given pointer location, by reading 8 bytes and combining them into a Word64 using bit-shifting.

The #if MIN_VERSION_ghc_prim(0,8,0) CPP is needed because Word64 and indexWord8OffAddr# have both changed over time:

-- before
indexWord8OffAddr# :: Addr# -> Int# -> Word#
data Word64 = W64# Word#
toWord :: Word# -> Word#
w64 :: Word# -> Word64

-- ghc-prim-0.8.0+, aka base-4.16.0.0+, aka ghc 9.2+
indexWord8OffAddr# :: Addr# -> Int# -> Word8#  -- changed!
data Word64 = W64# Word#
toWord :: Word8# -> Word#  -- changed!
w64 :: Word8# -> Word64  -- changed!

-- ghc-prim-0.9.0+, aka base-4.17.0.0+, aka ghc 9.4+
indexWord8OffAddr# :: Addr# -> Int# -> Word8#
data Word64 = W64# Word64#  -- changed!
toWord :: Word8# -> Word64#  -- should have changed!
w64 :: Word8# -> Word64

We can see that toWord is missing a case: it has been updated to account for the ghc-prim-0.8.0 change, but not for the ghc-prim-0.9.0 change.

gelisam commented 1 year ago

The fix is simple, I'll send a PR.

#if MIN_VERSION_ghc_prim(0,9,0)
    toWord :: Word8# -> Word64#
    toWord w# = wordToWord64# (word8ToWord# w#)
#elif MIN_VERSION_ghc_prim(0,8,0)
    toWord :: Word8# -> Word#
    toWord w# = word8ToWord# w#
#else
    toWord :: Word# -> Word#
    toWord w# = w#
#endif
gelisam commented 1 year ago

Oh, silly me, an equivalent fix already exists at #307!

gelisam commented 1 year ago

two equivalent PRs already exist, even! #304

gelisam commented 1 year ago

I don't see an existing issue for this problem, only existing pull requests, so I think I am thinking of leaving this issue open until one of the two PRs is merged.