[Onyx-773] Revise data skew policy

snuspl / nemo

Nemo: A flexible data processing system

https://snuspl.github.io/nemo/

Apache License 2.0

22 stars 6 forks source link

[Onyx-773] Revise data skew policy #774

Closed jeongyooneo closed 6 years ago

jeongyooneo commented 6 years ago

This PR:

Sets hash range as the nearest prime number greater than dstParallelism * HashRangeMultipler
Uses error range for idealSizePerTaskGroup whose default value is set to 0.

jeongyooneo commented 6 years ago

@sanha Thanks! I've addressed the comment.

jeongyooneo commented 6 years ago

Thanks @wonook! As we've discussed offline, I've reverted to the good old style of calculating hash ranges. Distributing per-TaskGroup data evenly should be done elsewhere(for example at Partitioner), since calculatingHashRange simply divides at best effort given the already-partitioned hash ranges.

wonook commented 6 years ago

Looks good. I'll merge once the tests pass