Open wilsonwang371 opened 1 year ago
In our environment, we tried to run with a large amount of HDFS dataset files and we found OOM issue.
From the page https://beam.apache.org/documentation/runners/direct/ It seems like the direct runner will load all dataset into memory and this is probably why we are falling.
This is a high-priority task for us to make ray beam runner work in our production environment.
In our environment, we tried to run with a large amount of HDFS dataset files and we found OOM issue.
From the page https://beam.apache.org/documentation/runners/direct/ It seems like the direct runner will load all dataset into memory and this is probably why we are falling.
This is a high-priority task for us to make ray beam runner work in our production environment.