Refactor benchmark runner to reduce memory overhead

The current benchmark runner script adds additional memory overhead in the client and worker processes relative to running an individual query. These patterns are consistent across many runs.

On a DGX-2, while running Q02 in the runner, the client process on GPU 0 uses about 900 MB of memory. On the workers, the "baseline" memory pool is 28543 MB.

|=============================================================================|
|    0     41449      C   .../envs/rapids-gpubdb-20210331/bin/python 28543MiB |
|    0     43486      C   python                                       891MiB |
|    1     41453      C   .../envs/rapids-gpubdb-20210331/bin/python 32001MiB |
|    2     41456      C   .../envs/rapids-gpubdb-20210331/bin/python 28543MiB |
...

Running the same query as a standalone script, the client process uses about 630 MB of memory. On the workers, the "baseline" memory pool is 28289 MB.

|=============================================================================|
|    0     44016      C   .../envs/rapids-gpubdb-20210331/bin/python 28289MiB |
|    0     46114      C   python                                       633MiB |
|    1     44019      C   .../envs/rapids-gpubdb-20210331/bin/python 28289MiB |
|    2     44023      C   .../envs/rapids-gpubdb-20210331/bin/python 28289MiB |

rapidsai / gpu-bdb

Refactor benchmark runner to reduce memory overhead #203