Don't (only) use unsafe FFI, it blocks all Haskell threads during a big computation

The library uses unsafe FFI for the update function:

https://github.com/k0001/hs-blake3/blob/12884c59367a288e4952e7806fa08dee640a8653/blake3/lib/BLAKE3/IO.hs#L317-L319

This is rather wrong/unfortuante, because it can make the whole Haskell process freeze and halt all other Haskell threads, when it is given a large array (e.g. a few GB) to run on:

The GHC manual explains

GHC, since version 8.4, guarantees that garbage collection will never occur during an unsafe call.

This means that if you give the (even pure!) hash function a multi-GB ByteString and it runs for e.g. 2 seconds, and a GC occurs meanwhile, all other Haskell threads will stop doing anything, including e.g. updating progress bars or answering unrelated requests in a web server.

It means: Do a big hash, and your multi-threaded program fully pauses.

(Yes, blake3 is really fast, so the string must be big for 2 seconds hash time, but this same fundamental issue applies to all "pure" unsafe FFI calls and I just found it for this package first, so I'm reporting it here, and even if it's only 20 ms, that already destroys any web server's response time.)

Production downtime story

I found this issue by accidentally exaggerating the problem, given data to hash that I obtained via mmap, which made it extra slow because involving disk I/O. (Of course it is not a problem of this library that I cheated kernel-side I/O into pure code with mmap, but the problem still stands with normal pure large arrays.)

Hashing a 100 GB file this way "disabled" my web server causing a half-hour production downtime/outage.

What this library should probably do

unsafe is the correct solution for very short strings.
If an array given to hash/update is longer than large N bytes, it probably should automatically make multiple smaller calls to c_update in the high-level API (hash and update).
hash already has a convenient interface to allow this, consuming [bin]. If hash isn't changed, and c_update is kept unsafe, hash+update+c_update should at be given Haddocks that point out this pitfall.
In any case, c_update should document this.

Takeaway independent of this library

unsafe isn't only forbidden for blocking I/O -- it should also be forbidden for long-running CPU, because otherwise it will halt all other Haskell threads during the GC sync.

This is already described in the GHC users guide: Foreign imports and multi-threading
Turns out I posted about it 2 years ago already: https://old.reddit.com/r/haskell/comments/xlm4qv/haskell_ffi_call_safety_and_garbage_collection/ipko8lg/

Only use unsafe for pure CPU-bound FFI computations that take a couple nanoseconds
Edit: I started an initiative for GHC to facilitate finding long-running unsafe calls: GHC #25333: Add RTS stats and alerts for long-running unsafe foreign function (FFI) calls

Thank you Niklas for the details. I agree. I'll make the changes.

On Sep 30, 2024, 20:45, at 20:45, "Niklas Hambüchen" @.***> wrote:

The library uses unsafe FFI for the update function:

https://github.com/k0001/hs-blake3/blob/12884c59367a288e4952e7806fa08dee640a8653/blake3/lib/BLAKE3/IO.hs#L317-L319

This is rather wrong/unfortuante, because it can make the whole Haskell process freeze and halt all other Haskell threads, when it is given a large array (e.g. a few GB) to run on:

The GHC manual explains

GHC, since version 8.4, guarantees that garbage collection will never occur during an unsafe call.

This means that if you give the (even pure!) hash function a multi-GB ByteString and it runs for e.g. 2 seconds, and a GC occurs meanwhile, all other Haskell threads will stop doing anything, including e.g. updating progress bars or answering unrelated requests in a web server.

It means: Do a big hash, and your multi-threaded program fully pauses.

(Yes, blake3 is really fast, so the string must be big for 2 seconds hash time, but this same fundamental issue applies to all "pure" unsafe FFI calls and I just found it for this package first, so I'm reporting it here, and even if it's only 20 ms, that already destroys any web server's response time.)

Production downtime story

I found this issue by accidentally exaggerating the problem, given data to hash that I obtained via mmap, which made it extra slow because involving disk I/O. (Of course it is not a problem of this library that I cheated kernel-side I/O into pure code with mmap, but the problem still stands with normal pure large arrays.)

Hashing a 100 GB file this way "disabled" my web server causing a half-hour production downtime/outage.

What this library should probably do

unsafe is the correct solution for very short strings.

If an array given to hash/update is longer than large N bytes, it probably should automatically make multiple smaller calls to c_update in the high-level API (hash and update).

hash already has a convenient interface to allow this, consuming [bin]. If hash isn't changed, and c_update is kept unsafe, hash+update+c_update should at be given Haddocks that point out this pitfall.

In any case, c_update should document this.

Takeaway independent of this library

unsafe isn't only forbidden for blocking I/O -- it should also be forbidden for long-running CPU, because otherwise it will halt all other Haskell threads during the GC sync.

This is already described in the GHC users guide: Foreign imports and multi-threading

Turns out I posted about it 2 years ago already: https://old.reddit.com/r/haskell/comments/xlm4qv/haskell_ffi_call_safety_and_garbage_collection/ipko8lg/

Only use unsafe for pure CPU-bound FFI computations that take a couple nanoseconds

-- Reply to this email directly or view it on GitHub: https://github.com/k0001/hs-blake3/issues/5 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

Thanks!

I guess the question of the day is, what size should be given to c_update?

I assume that to get SIMD (e.g. AVX) benefits, we can't make it too small. 4 KiB? 32 KiB?

Maybe needs a benchmark (or re-use one from blake3 upstream).

what size should be given to c_update?

The BLAKE3 paper shows it already maximally effective for single-threaded operation at 16 KiB (on Intel Cascade Lake-SP), see Figure 3 of

https://github.com/BLAKE3-team/BLAKE3-specs/blob/ea51a3ac997288bf690ee82ac9cfc8b3e0e60f2a/blake3.pdf

I started an initiative for GHC to facilitate finding long-running unsafe calls:

GHC #25333: Add RTS stats and alerts for long-running unsafe foreign function (FFI) calls

k0001 / hs-blake3