nikita-volkov / ptr-poker

Pointer poking action construction and composition toolkit
http://hackage.haskell.org/package/ptr-poker
MIT License
8 stars 5 forks source link

Suggestion: Unbox `Poke` #15

Open raehik opened 1 year ago

raehik commented 1 year ago

Partial followup to https://github.com/haskell-perf/strict-bytestring-builders/pull/6 .

ptr-poker uses the following type for low-level pokes:

Ptr Word8 -> IO (Ptr Word8)

I believe unboxing would improve performance:

Addr# -> State# RealWorld -> (# State# RealWorld, Addr# #) }

Ptrs are data boxed, so this should remove some indirection. We have to unbox IO because it's not levity-polymorphic.

This representation gave me consistent better performance both over in https://github.com/haskell-perf/strict-bytestring-builders and in a less synthetic benchmark here, serializing lots of Word8s and ByteStrings. Main code is currently here. (Some tests in that repo also assert basic soundness.)

What do you think? If you found it appealing I would gladly merge my code in here. Otherwise I'd like to publish my lib on Hackage with attribution for the idea.

nikita-volkov commented 1 year ago

That's a great idea! I'll be happy to merge it. Do include the notes about your contribution in the PR as well. Also please provide some test coverage.

raehik commented 1 year ago

Awesome :) It will be a fairly big changeset that touches lots of the poking code. I might need some help confirming that certain bits are safe/sensible. I'll start a PR later

raehik commented 1 year ago

after a discussion with merijn on #haskell IRC I want to confirm the performance changes I saw -- since I might've been using an older version of ptr-poker and I use a potentially faster bytestring serializer (which could be used here too if safe). apparently it's a little surprising that there was such a difference by unboxing Ptrs and IO

raehik commented 1 year ago

I need more benchmarks to figure out what's going on. By replacing withForeignPtr with unsafeWithForeignPtr for serializing bytestrings, all the https://github.com/haskell-perf/strict-bytestring-builders benchmarks improve tremendously. In context:

poke :: ByteString -> Ptr Word8 -> IO (Ptr Word8)
poke (BS fptr length) ptr =
  {-# SCC "poke" #-}
  unsafeWithForeignPtr fptr $ \ bytesPtr ->
    memcpy ptr bytesPtr length $>
    plusPtr ptr length

fumieval's mason uses it here (implementation copied from GHC for compat) in the same way, exclusively for bytestring serialization.

However, swapping to withForeignPtr in my library and running a generics-based benchmark changes absolutely nothing. So I'm currently unsure where the performance improvement is being introduced.

raehik commented 1 year ago

Benching serializing ~5kb of bytestrings on my lib vs. the latest commit on this repo still says my lib is faster. The only thing that should be happening there is bytestring serializing (identical) and generics -- where the code is identical, so it would appear to come from the semigroup instance.

So it would appear this is worthwhile. It's a shame the benchmarks are so all over the place.

Edit: Wait, no! This time unsafeWithForeignPtr did give a massive improvement! OK, so it's dependent on GHC's mood (maybe it can optimize better when unboxed). Now the improvement is only 10% with unboxed. Sorry for spam. I'll make a PR for unsafeWithForeignPtr and go from there.