mvashishtha commented 2 years ago

In https://github.com/ray-project/ray/issues/20388, it turned out that the Ray object store performs very poorly on Mac when it's storing more than about 2 GiB. Ray's solution, https://github.com/ray-project/ray/pull/21224, was to limit the size of the object store to 2 GiB. The result is that ray spills to disk for data in the range of 2 GiB to 10 GiB, where Modin is supposed to perform much better than pandas.

In #4335, we overrode the 2 GiB limit to start Ray with Modin's usual object store size, but it seems that the slow object store is even worse than spilling to disk (see #4713).

For now I think we should do the following:

[x] Accept spilling to disk instead of increasing object store size (#4713)
[ ] Document that Modin on ray on mac has poor performance on data larger than 2 GB
[ ] Raise very loud warnings when a Modin user tries to use ray on mac
[ ] Confirm whether Dask can do better than ray on mac for large datasets for a representative set of tasks.
- [ ] If so, make Dask the default execution engine on macs.

cc @modin-project/modin-ray

rkooo567 commented 2 years ago

When the disk spilling is happened (due to 2GB limit), how slow is it compared to Dask & windows Ray when the spilling is invoked due to high memory usage? Do you guys have some number?

mvashishtha commented 2 years ago

how slow is it compared to Dask & windows Ray when the spilling is invoked due to high memory usage?

@rkooo567 to answer your question, I did some performance testing of 3 common kinds of embarrassingly parallel operations in pandas and Modin. I don't have access to a Windows machine, so I tested on my mac and linux (specs for both machines below). Both the mac and linux machines had eight cores, though the linux one had twice as much RAM. Still, I think the difference in RAM didn't matter, because these benchmarks weren't maxing out memory usage on the mac. (and there was no spilling on ray even when using a default object size of 10 GB).

In each case, spilling to disk on Mac is either much better or roughly the same as not spilling to disk, so we should accept the ray limit on the object store size (#4713). However, Ray with the smaller object store is slightly slower than dask on the applymap (~1.3x as long), about 1.4x as long as dask on apply, and about 1.5x as long as dask on read_csv. linux is normally fastest, sometimes by a lot (e.g. 50% in read_csv).

It would be really nice if ray could work around the macOS mmap bug and have a faster object store instead of limiting the object store size and spilling to disk. Otherwise, depending on how Dask does on a wider variety of benchmarks, Modin might need to make Dask the default engine on macs.

Please let me know whether that assessment makes sense. Detailed results are below.

Element-wise apply

Benchmark script

```python import modin.pandas as pd import numpy as np from modin.config import BenchmarkMode import time BenchmarkMode.put(True) # Warm ray up test_df = pd.DataFrame(np.random.randint(0, 100, size=(2**9, 2**9))) test_df.applymap(lambda x: x) del test_df df = pd.DataFrame(np.random.randint(0, 100, size=(2**20, 2**8))) start = time.time() df.applymap(lambda x: x) end = time.time() print(f'apply took: {end - start}') ```

pandas

Mac: 87.25 sec Linux: 98.91 sec

ray on linux

11.98 sec

mac

dask

(10.26, 10.22) sec

with modin's current ray object store

11.98

with default ray object store

The benchmark script just added import ray and ray.init() before creating the dataframes.

I got spilling to disk and the applymap was 12.92 sec.

column-wise apply

Benchmark script

```python import modin.pandas as pd import numpy as np from modin.config import BenchmarkMode import time BenchmarkMode.put(True) # Warm ray up test_df = pd.DataFrame(np.random.randint(0, 100, size=(2**9, 2**9))) test_df.applymap(lambda x: x) del test_df df = pd.DataFrame(np.random.randint(0, 100, size=(2**20, 2**8))) start = time.time() df.apply(lambda col: col) end = time.time() print(f'apply took: {end - start}') ```

pandas

Mac: (5.957 sec, 5.706 sec) Linux: 3.488 sec

ray on linux

1.655 sec

mac

dask

(2.247, 2.174, 2.642) sec

with modin's current ray object store

17.97 sec

with default ray object store

spill to disk and get (3.354 sec, 3.167 sec, 3.484 sec)

read_csv

Generate CSV script

```python import modin.pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0, 100, size=(2**21, 2**8))) df.to_csv("2mebi_by_256.csv", index=False) ```

Benchmark script

```python import modin.pandas as pd import numpy as np from modin.config import BenchmarkMode import time BenchmarkMode.put(True) # Warm ray up test_df = pd.DataFrame(np.random.randint(0, 100, size=(2**9, 2**9))) test_df.applymap(lambda x: x) del test_df start = time.time() df = pd.read_csv("2mebi_by_256.csv") end = time.time() print(f'read_csv took: {end - start}') ```

pandas

mac: (31.37, 33.36) sec linux: 40.57

ray on linux

9.371 sec (with spilling to disk)

mac

dask

(14.90, 13.11) sec

current modin object store

29.35 sec

default ray object store

(21.93, 20.64, 20.42) sec

Appendix: specs

Mac specs:

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Monterey
Computer model: MacBook Pro (16-inch, 2019)
Memory: 16 GB 2667 MHz DDR4
Processor: 2.3 GHz 8-Core Intel Core i9
Modin version (modin.__version__): cc713c5c9055f717ac891cd1fccbd08987e169cd
Python version: 3.10.4
Pandas version: 1.4.3
Ray version: 1.13.0

The Ubuntu EC2 instance:

AMI name: ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211129
instance type: t2.2xlarge
Ubuntu 20.04.3 LTS
8 physical CPUs
Memory: 32 GiB
Ray: 1.12.0
Python: 3.9.7
Modin: 0.15.2
pandas: 1.4.3

modin-project / modin

PERF: Deal with Ray's poor performance on data larger than 2 GiB on macOS #4714

Element-wise apply

pandas

ray on linux

mac

dask

with modin's current ray object store

with default ray object store

column-wise apply

pandas

ray on linux

mac

dask

with modin's current ray object store

with default ray object store

read_csv

pandas

ray on linux

mac

dask

current modin object store

default ray object store

Appendix: specs