Closed m4burns closed 2 years ago
Pushed a change to remove a stray add1
left from previous iterations.
FYI, this runs for several hours without crashing now.
Thanks for tracking this down, and thanks for including a test!
Based on your explanation, it seems like it should be sufficient for _dptr_to_bytes
to make an atomic-interior copy and then to allocate the pointer in nonatomic mode. The following version passes your test for me; does it also pass your other tests?
(define-fun-syntax _dptr_to_bytes
(syntax-id-rules (_dptr_to_bytes)
[_dptr_to_bytes
(type: _pointer
pre: (x => (begin
(unless (bytes? x)
(error '_dptr_to_bytes "expected bytes?"))
(let ([p (malloc _pointer 1 'nonatomic)]
[b (malloc _byte (bytes-length x) 'atomic-interior)])
(memcpy b x (bytes-length x))
(ptr-set! p _pointer b)
p))))]))
So, that was the approach I tried first, but a longer stress test eventually causes a crash with the message "misplaced pair". This comes from racket/src/ChezScheme/s/mkgc.ss
and I'm guessing has something to do with the GC interpreting unallocated memory as managed. I think this happens because openssl increments the value *p
, causing *p
to point to something invalid when we return from openssl.
I tried mitigating by allocating an extra byte at the end of the buffer so *p
would still be in the interior of the object when we return from openssl, but got the same crash.
The docs for malloc
in ffi/unsafe
, along with this comment in racket/src/cs/rumble/foreign.ss
:
;; This is not quite the same as Racket BC, because interior
;; pointers are not allowed as GCable pointers. So, "interior"
;; just means "doesn't move".
seem to imply that 'nonatomic
pointers cannot point to the interior of objects in CS.
Right now, though, I'm having trouble reproducing the same crash with the definition of _dptr_to_bytes
from your comment. But it does seem like *p
pointing past the end of b
after the call would be a problem for the GC - maybe there's extra padding or accounting information at that address which is making this work?
I'll just leave the test above running for a few hours and report back.
I just found a test that quickly crashes with your suggested _dptr_to_bytes
but not mine:
#lang racket
(require crypto
crypto/libcrypto)
(crypto-factories (list libcrypto-factory))
(map sync
(for/list ([t 2])
(thread
(thunk
(for ([i 1000000])
(define xs (for/list ([i 10]) i))
(define k (generate-private-key 'ec '((curve secp256r1))))
(define p (map identity (pk-key->datum k 'rkt-public)))
(define q (map identity (pk-key->datum k 'rkt-private)))
(unless
(pk-verify
(datum->pk-key p 'rkt-public)
#"asdfasdf"
(pk-sign
(datum->pk-key q 'rkt-private)
#"asdfasdf"))
(error 'bad))
(for/sum ([x xs]) x))))))
I had to start this test a couple times before I got a crash in the first few seconds. The for/list
, for/sum
and map
s seem to be necessary to get the GC going at the right moments.
For yours:
marc@builder ~/crypto/crypto-lib/private/libcrypto % time ~/racket/racket/bin/racket repro2.rkt
misplaced pair
zsh: abort (core dumped) ~/racket/racket/bin/racket repro2.rkt
~/racket/racket/bin/racket repro2.rkt 4.54s user 0.34s system 81% cpu 5.993 total
For mine, it doesn't crash in the first 15 seconds across 10 runs, or after 10 minutes running continuously.
So, overall, I think there is some issue with using the 'nonatomic
p
but I have only a sketchy understanding of it.
Hi Ryan,
The following program usually crashes on my machine with one of several different errors:
For example:
I eventually tracked down the root cause. Buffers passed to openssl
i2d_*
andd2i_*
are double-indirected with the_fun
type(_ptr i _pointer)
. This type allocates a temporary pointer that is filled in with the address of the generated wrapper's argument before the FFI call. The temporary pointer is allocated'atomic
by default, which means GC tracing is disabled. If a garbage collection occurs after the temporary pointer is written but before the FFI call, the buffer can be moved, which causes these failures (and general memory corruption).I tried to resolve this by allocating the temporary pointer
'nonatomic
. Since openssl increments the pointer, this seems to cause other GC issues (test programs eventually crash with errors like "misplaced pair").This patch resolves the issue by allocating all such buffers
'raw
and doing a bit of manual memory management. With these changes, the above program exits successfully. I'm running a bit of a longer test with better coverage as well; no issues so far.The
crypto-test
suite passes on these changes.