ray-project / ray_beam_runner

Ray-based Apache Beam runner
Apache License 2.0
41 stars 12 forks source link

OOM while running with a large number of HDFS files #62

Open wilsonwang371 opened 1 year ago

wilsonwang371 commented 1 year ago

In our environment, we tried to run with a large amount of HDFS dataset files and we found OOM issue.

From the page https://beam.apache.org/documentation/runners/direct/ It seems like the direct runner will load all dataset into memory and this is probably why we are falling.

This is a high-priority task for us to make ray beam runner work in our production environment.