oap-project / sql-ds-cache

Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
Apache License 2.0
37 stars 25 forks source link

[SQL-DS-CACHE-184][POAE7-1224] numa binding in Flink task managers #187

Closed iyupeng closed 3 years ago

iyupeng commented 3 years ago

What changes were proposed in this pull request?

How was this patch tested?

integration tests

github-actions[bot] commented 3 years ago

https://github.com/oap-project/sql-ds-cache/issues/184

iyupeng commented 3 years ago

@winningsix @jikunshang @yma11 Please take a review, thanks. This pull request provides numa binding feature in Flink.

Feature details:

  1. Users can configure to enable this feature.

  2. Different paths can be configured to numa nodes.

  3. Also it's supported to configure the order of paths when writing shuffle data. Use paths one by one until full utilization, or treat them equally like RAID0.

  4. However, external sort data will be written to all paths like RAID0 since the high difficulty to customize this part without changing Flink framework.

  5. Disk spaces will be checked when writing shuffle data.

  6. No changes are required for Flink or Flink applications. Just need to put jar file of ape-flink into Flink's classpath, e.g. '$FLINK_DIR/lib'.