Closed WrathfulSpatula closed 2 years ago
While this would systematically prevent resource temporarily unavailable
from ever happening, potentially, it turns out it's not practically necessary, (and we also avoid a singleton pattern, therefore).
QEngineCPU
previously dispatched an "async
" task for all QUnit
subsystem method calls below the threshold of efficient parallelism, down to 1 or 2 qubit subsystems. However, as fast as the typical consumer CPU single thread speed is, we can actually get slightly better performance by handling very small subsystem method calls on the "main" or "UI" thread, while we avoid many thread dispatches in the process. The "sweet spot" for switching between main thread and "async
" dispatch, on my system, seems to be very roughly around 2 qubits below the PSTRIDEPOW
parameter, which controls how many work items are dispatched to a single thread at a time, and that parameter can be tuned at build time and by environment variable, now correspondingly raising or lowering this "async
" threshold as well. By default, this puts the minimum "async
" subsystem size at about 12 qubits, so it would be exceedingly rare to ever dispatch even 3 asynchronous method calls at a time in a single QUnit
, if we had a simulation of 36+ qubits in the first place.
At an extreme of hybrid simulation method user code ever dispatching about 2-4 threads at once for a simulator, this all simply becomes moot. For anything failing the resource requirements to support even this case, we have long had the -DENABLE_QUNIT_CPU_PARALLEL=OFF
CMake build option, to completely disable this asynchronous behavior, and that would likely be preferable or required, for a processor that limited. Hence, we can table the central dispatch singleton, for now.
QUnit
attempts to dispatch many small qubit subsystems as asynchronous calls, withstd::future
. Due to gradual and recent improvements in the CPU domain of our "CPU/GPU hybridization" techniques,QUnit
and other simulator layers run many parallelizable asynchronous tasks, which is great, but we run too many for a <=16 hyperthread processor. If it helps,QUnit
asynchronous parallelization can be disabled by building with-DENABLE_QUNIT_CPU_PARALLEL=OFF
, (which actually works by turning offQEngineCPU
asynchronous parallelism). However, the asynchronous gains are rather significant, if we could maximize utilization of CPU parallelism while avoiding the (dreaded)resource temporarily unavailable
exception, from too many thread dispatches for the OS.Native Windows, or other operating systems, might already mostly not suffer from this problem, as the POSIX threading model can differ. Running on Ubuntu, (Linux,) user code POSIX threads are probably limited to about or exactly the CPU hyperthread count.
An obvious solution is a centralized CPU thread dispatch, (like as could be a singleton
DispatchQueue
, whichQEngineCPU
already uses slightly differently). Thread availability contention, in the dispatch, vs. maximal "async
" parallelization ofQUnit
subsystems, likely pays off very handsomely on net.