ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.94k stars 5.58k forks source link

GPU is too expensive. SIMD is better to provide high performance. Can ray provide some simd utility? #13593

Closed ggservice007 closed 3 years ago

ggservice007 commented 3 years ago

Many high performance projects or libraries use the SIMD to accelerate such as apache arrows and pysimdjson

Apache Arrows https://arrow.apache.org/overview/

Snip20210121_1

pysimdjson

https://github.com/TkTech/pysimdjson

Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, making pysimdjson safe to use anywhere.

Data-Parallel Programming via SIMD Vector Types

Can we some simd to parallel programming?

ggservice007 commented 3 years ago

year_2015_data_parallel_via_simd_vector_type.pdf

Snip20210121_3

The ray base layer is implemented by using C++. So we can write some simd utility to provide the Python user.

wuisawesome commented 3 years ago

Hey, Ray shouldn't interfere with the native execution of other libraries which rely on SIMD instructions (so you should get the full power of SIMD from pytorch/tensorflow/arrow already). Could your provide an example where you thing Ray isn't doing/performing the way you expect it to?

ggservice007 commented 3 years ago

I am developing a project which SIMD version cuGraph.

The first step is to implement the SIMD cudf.

I am implmeneting the SIMD DataFrames based on vaex or its ideas.

Vaex uses ApacheArrow data structures and C++ to speed up string operations.

Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas). It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second.

https://github.com/vaexio/vaex

The next step is to use the parallel functionality provided by Ray.

wuisawesome commented 3 years ago

I see, you can run any code that you can put in a python module (including any compiled cython) and Ray will just faithfully execute it.

ggservice007 commented 3 years ago

Nice, more and more libraries to support SIMD. The numpy also supports SIMD after 1.20.0.

image

https://github.com/numpy/numpy/releases/tag/v1.20.0rc2

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!