Contexts now construct a new stream to avoid competing for the default stream
Enables us to use threading to run multiple Context's in parallel
Modifies multiple_steps, multiple_steps_local and multiple_steps_local_selection to take in a host buffer to reduce memory copying, which provides a 5% speed up to Vacuum simulations (indicating we are CPU limited here by some amount)
Not as much of a gain in performance as using MPS, but can get within 10% of the maximum performance, also can parallelize in notebooks.
Benchmarks
A10, Cuda Arch 8.6
Vacuum is 5% faster, else everything else appears to be within variation.
multiple_steps
,multiple_steps_local
andmultiple_steps_local_selection
to take in a host buffer to reduce memory copying, which provides a 5% speed up to Vacuum simulations (indicating we are CPU limited here by some amount)Benchmarks
A10, Cuda Arch 8.6
Vacuum is 5% faster, else everything else appears to be within variation.
Master
Changes