Open GoogleCodeExporter opened 8 years ago
Hi Dave,
The error you describe is very generic. I would need more information to be
able to understand what is going on. I can tell you that this same code has
worked well in many different systems.
This "unspecified launch failure" simply means that the GPU kernel failed.
Excluding coding errors, the possible causes might be an error in the memory
allocation, launching too many CUDA threads, a problem in the installation of
CUDA, &c.
I would recommend first to try to simulate a smaller number of histories
(reduce the number of threads), and then try to reduce the amount of shared and
global memory used (reduce the values of MAX_NUM_PROJECTIONS and MAX_MATERIALS
in MC-GPU_v1.3.h).
Make sure that the allocated shared and constant memories (given during the
compilation) are not larger than the resources in your GPU.
Did the program work with this modifications?
Original comment by andre...@gmail.com
on 29 Apr 2013 at 7:55
I did get it to work by modifying the number of threads and particles
simulated. I think I was limited on the memory available because it is a laptop
GPU. Thank you very much for the advice!
Original comment by DAPDunke...@gmail.com
on 30 Apr 2013 at 3:24
Good to know it works!
You can simulate as many particles as you want but increase the number of
particles to be simulated for each thread to make sure that the total number of
CUDA blocks and threads can be handled by your GPU.
The code automatically increases the number of particles per thread to keep the
number of blocks below 65000 but even this may be too much for a laptop GPU.
Original comment by andre...@gmail.com
on 30 Apr 2013 at 8:55
Hi,
I am finding myself in the situation that Dave described here in Issue 15 a
while ago. When running the sample simulation with 6 voxels I get the same
error with ‘(4) unspecified launch failure’. I did get it to work when
decreasing the particles simulated to 10^7 and number of threads to 96, but I
would like to be able to simulate a larger number of particles.
CUDA 5.0 is installed and seems to be working when running the CUDA samples.
The computer has 64 GB RAM and a NVIDIA PNY Quadro 4000 2 GB GDDR5 (256
parallel processing cores).
I would be grateful if you could help me understand if the error depends on a
limitation in memory or if there is something else that may be the problem.
Thanks!
Original comment by hannie.p...@gmail.com
on 14 Oct 2013 at 1:06
This is a generic error, I can not know what is failing.
Probably you are requesting more resources than the GPU has.
Reducing the number of threads per block usually solves this (reducing the
number of histories is not necessary).
You may not have enough shared or constant memory (you can reduce this editing
the .h files). Look at how much memory the program uses in the output of the
compilation script.
To understand the resources you are using open the "CUDA Occupancy Calculator"
Excel spreadsheet provided with CUDA and write the amount of memory and the
number of threads you use. Then compare with the resources in your GPU.
Another important thing is that you MUST compile the executable for the
appropriate Compute Capability available in your GPU. Edit the makefile script
to change the compute capability (more info in the CUDA user guide; run
./deviceQuery sample code to get all info about your GPU).
Finally, make sure you don't use graphics with the GPU by switching to command
line mode or (better) plugging the monitor to another GPU.
Good luck!
Original comment by andre...@gmail.com
on 14 Oct 2013 at 5:22
First, you write that I must edit the makefile script for the appropriate
compute capability, how do I do that?
And I am afraid I would need more help to understand the Occupancy Calculator
and use it accurately.. I believe my compute capability is 2.0 and shared
memory size is 49152 bytes, but I am not sure of how to interpret my resource
usage from the compilation script output
(https://www.dropbox.com/s/9wjioonne2dygfc/output.pdf). It says that two entry
functions are compiled for 'sm_13', 'sm_20' and 'sm_30'. Should the registers
and shared memory used in the different steps be added together, and should I
also account for ' an external shared memory array' mentioned (as 2048 bytes)
in the Help sheet of the Occ. Calculator?
If I state a number of Registers Per Thread larger than 63 (which is the last
mentioned under 'Function properties' in the compilation output) the occupancy
is calculated to 0. Do this mean I have to reduce the used registers in some
way?
When reducing MAX_MATERIALS in MC-GPU_v1.3.h I get a lower shared memory
usage, but I find no effect of reducing MAX_NUM_PROJECTIONS.
If it is correct to use the last compilation output, 63 registers and 3824
bytes smem (MAX_MATERIALS = 10), and Threads Per Block is set to (for example)
32, then I get occupancy 17 %. Even if it is 'Limited by Max Warps or Max
Blocks per Multiprocessor' it means it would work if this was the case, does it
not?
I apologize if my questions make no sense, I am not really sure of what I am
asking. Maybe you can think of something that might steer me in the right
direction. Thank you!
Original comment by hannie.p...@gmail.com
on 17 Oct 2013 at 11:32
The compute capability to be compiled is specified in the Makefile. Currently
it compiles for 1.3, 2.0 and 3.0 (you can change this by changing the CFLAG
parameters ending in _13, _20 or _30). This should work well in your GPU.
The shared memory use and occupancy is important to get maximum performance but
it is not important if you the code does not work at all.
Since I know that the code is able to work (I have used it many times with many
different systems and GPUs), the problem must be in the configuration of your
workstation and CUDA system.
Try to compile and run a sample CUDA application first, and then try to run
MC-GPU.
Use a Linux system without X windows, just the command line.
The CUDA documentation is very helpful. You can find the documentation in the
CUDA installation folder (eg, /usr/local/cuda/doc/pdf/).
Read specially:
- CUDA_C_Programming_Guide.pdf
- CUDA_Getting_Started_Guide_For_Microsoft_Windows.pdf (or _For_Linux.pdf)
Original comment by andre...@gmail.com
on 17 Oct 2013 at 4:02
Original issue reported on code.google.com by
DAPDunke...@gmail.com
on 22 Apr 2013 at 7:30