Open GoogleCodeExporter opened 9 years ago
I can reproduce this issue. It looks like sum (even on a single array) is not
using multiple threads and anything inside the sum won't be accelerated.
What would be the best way to code this? I'll be happy to help.
Thanks.
Original comment by nicolas....@gmail.com
on 4 Mar 2012 at 7:16
+1 In a use case I am encountering, the numexpr.evaluate("sum(a)") version is
takes over 60s to complete, uses only one core, BUT keeps the memory usage
quite low. OTOH, the numexpr.evaluate("a").sum() version takes just a few
seconds to complete, uses many cores, BUT uses as much as 15GB of memory,
albeit only momentarily.
This is a very substantial defect which appears to affect multiple users.
Original comment by jsw....@gmail.com
on 31 Jul 2012 at 3:59
I don't think default ndarray.sum() method is capable of using more than one
core.
The dirty workaround I use for myself now is to use parallel sum using OpenMP
and weave.inline
Here's the function...
def openmpSum(in_array):
"""
Performs fast sum of an array using openmm
"""
from scipy import weave
a = numpy.asarray(in_array)
b = numpy.array([1.])
N = int(numpy.prod(a.shape))
code = r"""
int i=0;
double sum = 0;
omp_set_num_threads(4);
#pragma omp parallel for \
default(shared) private(i) \
reduction(+:sum)
for (i=0; i<N; i++)
sum += a[i];
b[0] = sum;
"""
weave.inline(code, ['a','N','b'],
extra_compile_args=['-march=native -O3 -fopenmp ' ],
support_code = r"""
#include <stdio.h>
#include <omp.h>
#include <math.h>""",
libraries=['gomp'])
return b[0]
Original comment by mimak...@gmail.com
on 31 Jul 2012 at 6:05
Original issue reported on code.google.com by
mimak...@gmail.com
on 19 Feb 2012 at 2:06