Open haojinIntel opened 3 years ago
@zhouyuan @zhixingheyi-tian Please help to track the issue. Thanks!
cc @weiting-chen
If we use another cluster which contains 1 master and 3 workers and each worker has 384GB DRAM, q23a.sql will fail on the 1st round. This is the tpcds-kit version I used: https://github.com/davies/tpcds-kit.git
.
@haojinIntel thanks fore reporting, the log shows some container got killed during the tests, probably due to memory issue
this is due to big memory footprint and lack of spill support. we have two pending PRs which should fix this https://github.com/oap-project/gazelle_plugin/pull/387 https://github.com/oap-project/gazelle_plugin/pull/369
We try to run TPC-DS power test for 20 rounds to verify the stability of gazelle_plugin. The cluster contains 3 workers and each worker has 512GB DRAM. The data scale is 1.5TB. Q23a.sql will fail in next few rounds which cause the failure of thrift-server. The error message is showed below:
Our test can pass when the latest commit is "e4782f0aaa6cee899bed5a3cb1d9ba2b2c219461".