quil-lang / qvm

The high-performance and featureful Quil simulator.
Other
411 stars 57 forks source link

Work around out of memory issue with foreign allocator. #199

Closed jmbr closed 4 years ago

jmbr commented 4 years ago

Work around for #198.

stylewarning commented 4 years ago

Maybe we should explicitly deallocate instead of relying on the GC? It's scary that every request calls a GC which locks all threads.

jmbr commented 4 years ago

Maybe we should explicitly deallocate instead of relying on the GC? It's scary that every request calls a GC which locks all threads.

Right now this is only triggered when using foreign memory allocation and most users will be resorting to lisp allocation for the foreseeable future. I agree with you but I think the current patch is OK as a transient fix and I will let someone else (perhaps my future, not-on-vacation self?) produce a more definitive fix.

stylewarning commented 4 years ago

I'll accept it if you write somewhere that it's a transient fix in the source.

stylewarning commented 4 years ago

@jmbr

jmbr commented 4 years ago

@jmbr

ROFL. Deal!

P.S.: See also #200

jmbr commented 4 years ago

Let me give this another go later before merging.

appleby commented 4 years ago

Happened upon this paper today which touches on some related GC themes:

Dynamic Optimizations for SBCL Garbage Collection

The paper is short and light on details, but all three of the optimizations mentioned seem like things qvm-app might potentially benefit from (with the huge caveat that the methods hinted at in the paper rely on running inside a container--though it isn't immediately clear that they wouldn't/couldn't work in a non-containerized environment).

Relevant snippet from the paper:

3.1 Optimization # 1: Trigger GC based on host memory constraints

The first optimization is to use memory utilization metrics to indicate to the lisp runtime when collecting garbage is actually required to avoid an out of memory condition. This entails configuring up to two memory thresholds. For an application running within a container, these thresholds are most logically expressed as a fraction of the container’s total memory size. When the container’s memory in use crosses a specified threshold, the lisp process is notified of the impending need to collect garbage. These triggers may replace the standard behavior of beginning garbage collection after allocating a fixed number of bytes on the lisp heap.

In the paper, this is paired with a couple of other optimizations, one of which is to delay promoting objects out of the nursery if GC happens while an application request is in process. In the case of qvm-app, this would presumably mean that the QVM object remains in gen0 until after the request is finished, at which point it would no longer be live and get collected.

When the GC runs while one or more work items are in progress, only generation 0 is scavenged, and no surviving objects are promoted to an older generation. Heap objects which survive a gen 0 GC are presumed to be logically associated with in-progress work items and are expected to be garbage when those work items complete.

For the record, I'm not proposing we implement any of these for qvm-app. I just happened on the paper and the theme was topical/seemed worth sharing. Maybe stylewarning even remembers the talk and/or got a t-shirt 👕.

jmbr commented 4 years ago

Happened upon this paper today which touches on some related GC themes:

Dynamic Optimizations for SBCL Garbage Collection

Thanks for referencing that paper. I have read it and agree that it could be useful.

The problem here is indeed that the GC is unaware of the true extent of a QVM object (in particular, its foreign-allocated vector of amplitudes) and that the QVM object survives for many generations. Triggering the GC based on the current process' resident segment size as suggested in the paper is a good solution. It is, however, unportable across Lisp implementations and operating systems. For instance, SBCL can use GENCGC on some architectures and CheneyGC on others whereas ECL uses the Boehm garbage collector. Also, the typical way to query available memory in Linux is via the proc filesystem whereas the mechanism differs in macOS (and Windows). One could create a thread that periodically monitors memory usage and manually triggers a full GC when necessary but I think a simpler solution to fix this properly is to manually free the amplitudes vector as soon as possible after finishing the request.

appleby commented 4 years ago

I think a simpler solution to fix this properly is to manually free the amplitudes vector as soon as possible after finishing the request.

Yep, I agree.

notmgsk commented 4 years ago

I think a simpler solution to fix this properly is to manually free the amplitudes vector as soon as possible after finishing the request.

Yep, I agree.

I concur.

appleby commented 4 years ago

I think a simpler solution to fix this properly is to manually free the amplitudes vector as soon as possible after finishing the request.

Yep, I agree.

I concur.

🤝