Closed duncan-brown closed 8 years ago
Problem was a collision between the kernel name and the function name. The solution is
diff --git a/pycbc/types/array_cuda.py b/pycbc/types/array_cuda.py
index 079e54c..2ec6358 100644
--- a/pycbc/types/array_cuda.py
+++ b/pycbc/types/array_cuda.py
@@ -121,7 +121,7 @@ def get_norm_kernel(dtype_x, dtype_out):
"tp_z": dtype_to_ctype(dtype_out),
},
"z[i] = norm(x[i])",
- "norm")
+ "normalize")
def squared_norm(self):
a = self.data
Then it works:
(pycbc-dev)[dbrown@spice-dev5 timing]$ ./match_perf.py --scheme=cuda --device-num=0 --size=20 --iterations=1000
RUNNING ON GeForce GTX 750 Ti
SIZE 20
Foverlap 0.26 msec 233611.3 op/min
MATCH 3.47 msec 17296.7 op/min
MATCH FAST 2.98 msec 20114.1 op/min
FILTER FAST 2.02 msec 29685.5 op/min
compares to CPU:
(pycbc-dev)[dbrown@spice-dev5 timing]$ ./match_perf.py --scheme=cpu --size=20 --iterations=500
RUNNING ON CPU
SIZE 20
/home/dbrown/src/pycbc-dev/lib/python2.6/site-packages/numpy/lib/utils.py:99: DeprecationWarning: `scipy.weave` is deprecated, use `weave` instead!
warnings.warn(depdoc, DeprecationWarning)
Foverlap 2.47 msec 24327.8 op/min
MATCH 73.08 msec 821.1 op/min
MATCH FAST 54.64 msec 1098.2 op/min
FILTER FAST 31.79 msec 1887.6 op/min