modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.63k stars 653 forks source link

Which one: Ray or Dask #3508

Closed transreductionist closed 1 year ago

transreductionist commented 2 years ago

Interested in what would be the best combination for a group running pre-prod and prod in Linux Docker, but developing and running on Windows:

  1. Modin uses Ray, but Ray is experimental on Windows.
  2. Modin is experimental with Dask, but Dask is supports Windows.

Would 1 or 2 be the best choice for stability?

devin-petersohn commented 2 years ago

For this setup I would recommend you use Ray. I have been actively using Modin and Ray on Windows (not WSL) for about a year. There are occasional issues, but restarting the interpreter typically clears the issue. Sometimes I have had to manually kill Redis from the Task Manager, but only around once or twice a month does it get stuck in that way.

Ray and Dask each have their quirks. I am partial to Ray because it came from the same research lab as Modin. I have had more issues with Dask than Ray on the memory front. Ray's architecture is more intuitive from distributed computing fundamentals and generally faster, but Dask is pure Python which is why it doesn't have these cross platform compatibility issues.

Since development and testing are all you're using Windows for, Ray is my recommendation. Happy to chat more about your use case if you want to email me: devin.petersohn@gmail.com

transreductionist commented 2 years ago

Hi Devin,

Your dissertation was very interesting, and the theoretical DataFrame algebra is so cool. I am reminded of why Dikstra chose programming over physics.

I work with a financial modelling team at FreddieMac. The division is Mercury (around 30 developers), and I work within the FAS team within the division. It has the visibility of the top levels of management. Our data is on the order of 10-40 GB and managed with DataFrames across the codebase.

I came across Modin looking at new technologies one weekend. Presented it to my development team, and it was received with enthusiasm. This Thursday I am presenting your ideas and some preliminary performance results to the division.

I was wondering if you might like to get together at some point with the developers and managers of the division for a brown bag presentation and Q&A.

Regards,

Aaron Peters

On Sat, Oct 2, 2021 at 10:23 PM Devin Petersohn @.***> wrote:

For this setup I would recommend you use Ray. I have been actively using Modin and Ray on Windows (not WSL) for about a year. There are occasional issues, but restarting the interpreter typically clears the issue. Sometimes I have had to manually kill Redis from the Task Manager, but only around once or twice a month does it get stuck in that way.

Ray and Dask each have their quirks. I am partial to Ray because it came from the same research lab as Modin. I have had more issues with Dask than Ray on the memory front. Ray's architecture is more intuitive from distributed computing fundamentals and generally faster, but Dask is pure Python which is why it doesn't have these cross platform compatibility issues.

Since development and testing are all you're using Windows for, Ray is my recommendation. Happy to chat more about your use case if you want to email me: @.***

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/modin-project/modin/issues/3508#issuecomment-932850499, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBR5L7D6N3ETBOELO4H3ZLUE65CFANCNFSM5FGWQUCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

RehanSD commented 1 year ago

Hi @transreductionist! I'm marking this issue as resolved since it hasn't been updated recently! If you have any further questions, please feel free to reopen this issue, or open a new one!