rmculpepper / crypto

17 stars 13 forks source link

Resolve ffi gc race conditions #18

Closed m4burns closed 2 years ago

m4burns commented 2 years ago

Hi Ryan,

The following program usually crashes on my machine with one of several different errors:

#lang racket

(require crypto
         crypto/libcrypto)

(crypto-factories (list libcrypto-factory))

(define k (generate-private-key 'ec '((curve secp256r1))))

(for ([i 100000])
  (datum->pk-key (pk-key->datum k 'rkt-public) 'rkt-public))

For example:

% racket repro.rkt
bytes->asn1/DER: tag mismatch
  expected: universal 16 (SEQUENCE)
  decoded: universal 0

I eventually tracked down the root cause. Buffers passed to openssl i2d_* and d2i_* are double-indirected with the _fun type (_ptr i _pointer). This type allocates a temporary pointer that is filled in with the address of the generated wrapper's argument before the FFI call. The temporary pointer is allocated 'atomic by default, which means GC tracing is disabled. If a garbage collection occurs after the temporary pointer is written but before the FFI call, the buffer can be moved, which causes these failures (and general memory corruption).

I tried to resolve this by allocating the temporary pointer 'nonatomic. Since openssl increments the pointer, this seems to cause other GC issues (test programs eventually crash with errors like "misplaced pair").

This patch resolves the issue by allocating all such buffers 'raw and doing a bit of manual memory management. With these changes, the above program exits successfully. I'm running a bit of a longer test with better coverage as well; no issues so far.

The crypto-test suite passes on these changes.

m4burns commented 2 years ago

Pushed a change to remove a stray add1 left from previous iterations.

FYI, this runs for several hours without crashing now.

rmculpepper commented 2 years ago

Thanks for tracking this down, and thanks for including a test!

Based on your explanation, it seems like it should be sufficient for _dptr_to_bytes to make an atomic-interior copy and then to allocate the pointer in nonatomic mode. The following version passes your test for me; does it also pass your other tests?

(define-fun-syntax _dptr_to_bytes
  (syntax-id-rules (_dptr_to_bytes)
    [_dptr_to_bytes
     (type: _pointer
      pre: (x => (begin
                   (unless (bytes? x)
                     (error '_dptr_to_bytes "expected bytes?"))
                   (let ([p (malloc _pointer 1 'nonatomic)]
                         [b (malloc _byte (bytes-length x) 'atomic-interior)])
                     (memcpy b x (bytes-length x))
                     (ptr-set! p _pointer b)
                     p))))]))
m4burns commented 2 years ago

So, that was the approach I tried first, but a longer stress test eventually causes a crash with the message "misplaced pair". This comes from racket/src/ChezScheme/s/mkgc.ss and I'm guessing has something to do with the GC interpreting unallocated memory as managed. I think this happens because openssl increments the value *p, causing *p to point to something invalid when we return from openssl.

I tried mitigating by allocating an extra byte at the end of the buffer so *p would still be in the interior of the object when we return from openssl, but got the same crash.

The docs for malloc in ffi/unsafe, along with this comment in racket/src/cs/rumble/foreign.ss:

;; This is not quite the same as Racket BC, because interior
;; pointers are not allowed as GCable pointers. So, "interior"
;; just means "doesn't move".

seem to imply that 'nonatomic pointers cannot point to the interior of objects in CS.

Right now, though, I'm having trouble reproducing the same crash with the definition of _dptr_to_bytes from your comment. But it does seem like *p pointing past the end of b after the call would be a problem for the GC - maybe there's extra padding or accounting information at that address which is making this work?

I'll just leave the test above running for a few hours and report back.

m4burns commented 2 years ago

I just found a test that quickly crashes with your suggested _dptr_to_bytes but not mine:

#lang racket

(require crypto
         crypto/libcrypto)

(crypto-factories (list libcrypto-factory))

(map sync
  (for/list ([t 2])
    (thread
      (thunk
        (for ([i 1000000])
          (define xs (for/list ([i 10]) i))
          (define k (generate-private-key 'ec '((curve secp256r1))))
          (define p (map identity (pk-key->datum k 'rkt-public)))
          (define q (map identity (pk-key->datum k 'rkt-private)))
          (unless
            (pk-verify
              (datum->pk-key p 'rkt-public)
              #"asdfasdf"
              (pk-sign
                (datum->pk-key q 'rkt-private)
                #"asdfasdf"))
            (error 'bad))
          (for/sum ([x xs]) x))))))

I had to start this test a couple times before I got a crash in the first few seconds. The for/list, for/sum and maps seem to be necessary to get the GC going at the right moments.

For yours:

marc@builder ~/crypto/crypto-lib/private/libcrypto % time ~/racket/racket/bin/racket repro2.rkt
misplaced pair
zsh: abort (core dumped)  ~/racket/racket/bin/racket repro2.rkt
~/racket/racket/bin/racket repro2.rkt  4.54s user 0.34s system 81% cpu 5.993 total

For mine, it doesn't crash in the first 15 seconds across 10 runs, or after 10 minutes running continuously.

So, overall, I think there is some issue with using the 'nonatomic p but I have only a sketchy understanding of it.