rapidsai / gpu-bdb

Apache License 2.0
107 stars 43 forks source link

BlazingSQL q03 SF1K sometimes runs forever on a DGX-2 #208

Open beckernick opened 3 years ago

beckernick commented 3 years ago

BSQL q03 sometimes runs forever at SF1K. When it does this, GPU activity goes to 0% but the status polling thread continues to run (getQueryIsComplete).

wmalpica commented 3 years ago

@beckernick Could you please report what version of BSQL you are using?

>>> import blazingsql
>>> blazingsql.__info__()
beckernick commented 3 years ago

BlazingSQL version (git hash): c0a952fc9e5cf3cb7e33a98b7d4d26a818be5875
BlazingSQL branch name: HEAD
BlazingSQL branch tag: v0.20.0a
BlazingSQL build id: 0
BlazingSQL compiler version: GNU /usr/bin/c++ 7.5.0
BlazingSQL cuda flags: -Xcompiler -Wno-parentheses -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Xcompiler -Wall,-Wno-error=deprecated-declarations --default-stream=per-thread -DHT_DEFAULT_ALLOCATOR
BlazingSQL Operating system kernel: Linux-5.4.0-1038-aws
BlazingSQL Operating system architecture: x86_64
BlazingSQL Linux Operating system release: NAME=Ubuntu|VERSION=16.04.7 LTS (Xenial Xerus)|ID=ubuntu|ID_LIKE=debian|PRETTY_NAME=Ubuntu 16.04.7 LTS|VERSION_ID=16.04|HOME_URL=http://www.ubuntu.com/|SUPPORT_URL=http://help.ubuntu.com/|BUG_REPORT_URL=http://bugs.launchpad.net/ubuntu/|VERSION_CODENAME=xenial|UBUNTU_CODENAME=xenial```
wmalpica commented 3 years ago

Additionally, to help figure this out, it would be great to get all the logs, which would require enabling them. FOr this you need to set the following config_options: ENABLE_COMMS_LOGS=False, ENABLE_TASK_LOGS=False, ENABLE_OTHER_ENGINE_LOGS= False,

Also another question, is this with UCX or TCP communications?

beckernick commented 3 years ago

TCP. We'll send some logs over