gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
315 stars 348 forks source link

Kernel name collision for GPU normalization #797

Closed duncan-brown closed 8 years ago

duncan-brown commented 8 years ago
RUNNING ON  GeForce GTX 750 Ti
         SIZE     20
Traceback (most recent call last):
  File "./match_perf.py", line 64, in <module>
    psd2 = ntilde2.squared_norm()    
  File "<decorator-gen-62>", line 2, in squared_norm
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/PyCBC-22fbf5-py2.6.egg/pycbc/types/array.py", line 210, in _returntype
    ary = fn(self,*args)
  File "<decorator-gen-61>", line 2, in squared_norm
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/PyCBC-22fbf5-py2.6.egg/pycbc/types/array.py", line 62, in _convert
    return fn(self, *args)
  File "<decorator-gen-60>", line 2, in squared_norm
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/PyCBC-22fbf5-py2.6.egg/pycbc/scheme.py", line 184, in scheming_function
    return schemed_fn(*args, **kwds)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/PyCBC-22fbf5-py2.6.egg/pycbc/types/array_cuda.py", line 131, in squared_norm
    krnl(a, out)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/elementwise.py", line 199, in __call__
    range_ is not None or slice_ is not None)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pytools/__init__.py", line 469, in wrapper
    result = method(self, *args, **kwargs)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/elementwise.py", line 177, in generate_stride_kernel_and_types
    **self.gen_kwargs)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/elementwise.py", line 147, in get_elwise_kernel_and_types
    keep, options, **kwargs)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/elementwise.py", line 75, in get_elwise_module
    options=options, keep=keep)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/compiler.py", line 265, in __init__
    arch, code, cache_dir, include_dirs)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/compiler.py", line 255, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir, target)
  File "/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/compiler.py", line 137, in compile_plain
    stderr=stderr.decode("utf-8", "replace"))
pycuda.driver.CompileError: nvcc compilation of /usr1/dbrown/tmphJV9mP/kernel.cu failed
[command: nvcc --cubin -arch sm_50 -I/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/pycuda-2016.1-py2.6-linux-x86_64.egg/pycuda/cuda kernel.cu]
[stderr:
kernel.cu(7): error: more than one instance of overloaded function "norm" has "C" linkage

1 error detected in the compilation of "/usr1/dbrown/tmpxft_00004166_00000000-7_kernel.cpp1.ii".
]
duncan-brown commented 8 years ago

Problem was a collision between the kernel name and the function name. The solution is

diff --git a/pycbc/types/array_cuda.py b/pycbc/types/array_cuda.py
index 079e54c..2ec6358 100644
--- a/pycbc/types/array_cuda.py
+++ b/pycbc/types/array_cuda.py
@@ -121,7 +121,7 @@ def get_norm_kernel(dtype_x, dtype_out):
                 "tp_z": dtype_to_ctype(dtype_out),
                 },
             "z[i] = norm(x[i])",
-            "norm")
+            "normalize")

 def squared_norm(self):
     a = self.data
duncan-brown commented 8 years ago

Then it works:

(pycbc-dev)[dbrown@spice-dev5 timing]$ ./match_perf.py --scheme=cuda --device-num=0 --size=20 --iterations=1000
RUNNING ON  GeForce GTX 750 Ti
         SIZE     20
Foverlap 0.26 msec  233611.3 op/min 
MATCH 3.47 msec  17296.7 op/min 
MATCH FAST 2.98 msec  20114.1 op/min 
FILTER FAST 2.02 msec  29685.5 op/min 

compares to CPU:

(pycbc-dev)[dbrown@spice-dev5 timing]$ ./match_perf.py --scheme=cpu --size=20 --iterations=500
RUNNING ON CPU
         SIZE     20
/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/numpy/lib/utils.py:99: DeprecationWarning: `scipy.weave` is deprecated, use `weave` instead!
  warnings.warn(depdoc, DeprecationWarning)
Foverlap 2.47 msec  24327.8 op/min 
MATCH 73.08 msec  821.1 op/min 
MATCH FAST 54.64 msec  1098.2 op/min 
FILTER FAST 31.79 msec  1887.6 op/min 
duncan-brown commented 8 years ago

Fixed by https://github.com/ligo-cbc/pycbc/pull/805