heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.93k stars 445 forks source link

Sql is not running on GPU but in CPU #510

Open zrlhk opened 4 years ago

zrlhk commented 4 years ago

hi,I installed this GPU db today, I want to use GPU to speed up the sql like this: "insert into txtrs10 select 5000021,pid,count(*) as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50;"

but It seems running on CPU, during the quering time, the CPU is busying, but GPU not using:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.09 Driver Version: 430.09 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2070 Off | 00000000:03:00.0 Off | N/A | | 39% 23C P8 10W / 175W | 4303MiB / 7981MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce RTX 2070 Off | 00000000:04:00.0 Off | N/A | | 39% 21C P8 9W / 175W | 2255MiB / 7982MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 24276 C /opt/omnisci/bin/omnisci_server 4293MiB | | 1 24276 C /opt/omnisci/bin/omnisci_server 2245MiB | +-----------------------------------------------------------------------------+

so how to speed up with my gpu?

the info log:

2020-02-08T16:56:00.721572 I 24276 MapDHandler.cpp:5058 passing query to legacy processor 2020-02-08T16:56:00.723493 I 24276 Calcite.cpp:454 User calcite catalog omnisci sql 'select 5000021,pid,count() as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50' 2020-02-08T16:56:00.727682 I 24276 Calcite.cpp:481 Time in Thrift 1 (ms), Time in Java Calcite server 3 (ms) 2020-02-08T16:56:00.728869 I 24276 Calcite.cpp:454 User calcite catalog omnisci sql 'select 5000021,pid,count() as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50' 2020-02-08T16:56:00.732007 I 24276 Calcite.cpp:481 Time in Thrift 1 (ms), Time in Java Calcite server 2 (ms) 2020-02-08T16:56:01.397421 I 24276 Calcite.cpp:454 User calcite catalog omnisci sql 'select 5000021,pid,count() as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50' 2020-02-08T16:56:01.403435 I 24276 Calcite.cpp:481 Time in Thrift 0 (ms), Time in Java Calcite server 5 (ms) 2020-02-08T16:56:01.403949 I 24276 Calcite.cpp:454 User calcite catalog omnisci sql 'select 5000021,pid,count() as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50' 2020-02-08T16:56:01.408754 I 24276 Calcite.cpp:481 Time in Thrift 0 (ms), Time in Java Calcite server 4 (ms) 2020-02-08T16:56:02.037087 I 24276 ParserNode.cpp:2762 stdlog execute 1269 1314 omnisci admin 551-sLWV {"query_str"} {"select 5000021,pid,count() as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50"} 2020-02-08T16:56:02.037387 I 24276 MapDHandler.cpp:876 stdlog sql_execute 1268 1315 omnisci admin 551-sLWV {"query_str","execution_time_ms","total_time_ms"} {"insert into txtrs10 select 5000021,pid,count() as rid from relationP where tid in(894,1701,1132,2334,16555,491) GROUP by pid order by rid desc limit 50;","1314","1315"}

cdessanti commented 4 years ago

hi @zrlhk ,

the statement you are running has parts that would be run on cpu and others that would run on cpu; looking at nvidia-smi output looks that something ran on GPU, but I'm not sure if it's the IAS query or something you run before.

to be sure you query is running on GPU you should set the enbale-debug-timer parameter to true and looks into the logs for something like that

2020-02-17T19:44:57.378419 I 88892 measure.h:79 Timer end                          lauchGpuCode                       launchGpuCode:  195 elapsed 708 ms
2020-02-17T19:44:57.378522 I 88892 measure.h:79 Timer end                executePlanWithGroupBy              executePlanWithGroupBy: 2416 elapsed 708 ms
2020-02-17T19:44:57.378539 I 88892 measure.h:79 Timer end                execution_dispatch_run                          operator(): 1222 elapsed 708 ms
2020-02-17T19:44:57.632530 I 88892 measure.h:79 Timer end                          lauchGpuCode                       launchGpuCode:  195 elapsed 962 ms
2020-02-17T19:44:57.632639 I 88892 measure.h:79 Timer end                executePlanWithGroupBy              executePlanWithGroupBy: 2416 elapsed 962 ms
2020-02-17T19:44:57.632658 I 88892 measure.h:79 Timer end                execution_dispatch_run                          operator(): 1222 elapsed 962 ms

as you can see in this example I have a query that run on 2 GPUs for 708ms and 962ms (the table has an unbalanced number of fragments), then completed the insert/projections steps on CPU for a total 2183ms (so 962ms on GPU and 1200ms on CPU)

2020-02-17T19:44:58.060899 I 88892 measure.h:79 Timer end                  Exec_executeWorkUnit                 executeWorkUnitImpl: 1167 elapsed 1392 ms
2020-02-17T19:44:58.060988 I 88892 measure.h:79 Timer end                       executeWorkUnit                     executeWorkUnit: 1937 elapsed 1467 ms
2020-02-17T19:44:58.777582 I 88892 measure.h:79 Timer end                     executeRelAlgStep                   executeRelAlgStep:  310 elapsed 2183 ms
2020-02-17T19:44:58.777629 I 88892 measure.h:79 Timer end                      executeRelAlgSeq                    executeRelAlgSeq:  252 elapsed 2183 ms
2020-02-17T19:44:58.777676 I 88892 measure.h:79 Timer end             executeRelAlgQueryNoRetry           executeRelAlgQueryNoRetry:   95 elapsed 2184 ms
2020-02-17T19:44:58.777683 I 88892 measure.h:79 Timer end                    executeRelAlgQuery                  executeRelAlgQuery:   70 elapsed 2184 ms