martindurant / shm-distributed

Benchmark and run scripts for shared memory on dask
MIT License
0 stars 1 forks source link

Add vineyard to the shm benchmark #3

Closed sighingnow closed 1 year ago

sighingnow commented 1 year ago

The number looks not well, and it does work and I would look into the number further.

Edit: I have made a mistake in the "cache" part of vineyard implementation. After the bug fixed vineyard looks faster than plasma and I think there are still much optimization spaces inside the vineyard itself.

The difference comes from the performance difference of _put_vineyard_buffer and _put_plasma_buffer. I have added a timing decorator in https://github.com/martindurant/distributed/pull/2.

Hope the above information could be helpful.

shm_distributed/test_runs.py::test_workflow[client_scatter_workflow-pickle] 7.540348s
Start: 742 resident, 516 unique
Start: 4844 resident, 4618 unique
Δmem: 4127
PASSED
shm_distributed/test_runs.py::test_workflow[client_scatter_workflow-plasma] 1.174548s
Start: 766 resident, 526 unique
Start: 1793 resident, 1552 unique
Δmem: 6
PASSED
shm_distributed/test_runs.py::test_workflow[client_scatter_workflow-vineyard] 0.915509s
Start: 1792 resident, 1551 unique
Start: 2818 resident, 2577 unique
Δmem: 5
PASSED
shm_distributed/test_runs.py::test_workflow[worker_scatter_workflow-pickle] 10.160269s
Start: 2795 resident, 2568 unique
Start: 6906 resident, 6679 unique
Δmem: 4134
PASSED
shm_distributed/test_runs.py::test_workflow[worker_scatter_workflow-plasma] 3.202961s
Start: 2826 resident, 2585 unique
Start: 4875 resident, 2586 unique
Δmem: 1033
PASSED
shm_distributed/test_runs.py::test_workflow[worker_scatter_workflow-vineyard] 3.288973s
Start: 2827 resident, 2586 unique
Start: 10000 resident, 3615 unique
Δmem: 1045

Address #2

martindurant commented 1 year ago

The number looks not well, and it does work and I would look into the number further.

I disagree! The numbers for vineyard and plasma are about the same in delta-memory (system memory) and time. Why and how the "unique" appears higher, I am not sure, perhaps how memory is reported is different between the systems; but certainly the performance is greatly beating pickle on your system. At a guess, I would say that your memory is more constrained than mine.

Note that a lot of things will impact what we see (config, systems, choice of workflow), and the benchmarks here should be seen as a starting point from which we can work out something more robust.

martindurant commented 1 year ago

I updated the README with latest timings on my system. Note that even the order of the tests might be material.

sighingnow commented 1 year ago

and the benchmarks here should be seen as a starting point from which we can work out something more robust.

Totally agree.