cudppPlan execution time

mohdshamilshafi / cudpp

Automatically exported from code.google.com/p/cudpp

Other

0 stars 0 forks source link

What steps will reproduce the problem?
1. run cudppPlan in any function

What is the expected output? What do you see instead?

The output is correct (all tests in cudpp_testrig passed), but the first call 
of the function cudppPlan is slow (about a second).

What version of the product are you using? On what operating system?

Motherboard Rampage III GENE
Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz
DIMM 1333 MHz 6GiB
GeForce GTX680

Ubuntu 12.04
Cudpp 2.0
Cuda 5.0
Nvidia Driver version 310.19

Please provide any additional information below.

I mentioned an "infinite loop" issue in cudppPlan here:
https://groups.google.com/forum/?fromgroups=#!topic/cudpp/J4AEijEFzW4

The two problems seem to be related somehow: if I make an exit(2) just before 
the cudppPlan call (in simpleCUDPP.cu for example), the execution time is 
normal. If I make it on the first line of cudppPlan, it takes about 0.8 second 
to make the exit on my GTX680 card.

I also have a GT240 card. The same problem happens but it takes about 0.5 
second.

I just made a new compilation of the cudpp library and the execution time of 
simpleCUDPP was about 15 seconds. A second run and this is about one second 
again.

Original issue reported on code.google.com by nicolas....@gmail.com on 10 Dec 2012 at 6:46

I just ran on my Ubuntu 11.10 system. It reminded me about what you probably thought was your infinite loop: on linux, there is significant startup time for driver kernel module, which gets loaded the first time you run some GPU code, and unloaded automatically after a certain delay (not sure how long). There are a couple of ways to eliminate most of this startup delay: 1. Open a new terminal (or tab) and run "nvidia-smi -l 15" to re-run nvidia-smi every 15 seconds, which forces the driver kernel module to reload each time it runs. 2. Run nvidia-smi in persistence mode. You need to run as root, so "sudo nvidia-smi -pm 1". This will keep it loaded. When I do this, simpleCUDPP runs immediately rather than waiting several seconds (on my K20) to over a minute (on my GTX 680 -- that sounds excessive, possibly a driver bug?). But simpleCUDPP still shows over 700 ms of overhead (the kernels and memcopies take < 100 us). So I will investigate a bit more.

mohdshamilshafi / cudpp

cudppPlan execution time #125