Closed tberlok closed 7 years ago
As you can see in the data above, we do not get as good a scaling as in the cython-omp repo.
I changed the code such that we now use Np
in prange
where cdef int Np = particles.shape[0]
. This lead to a very good speed up and also better scaling.
ncalls tottime percall cumtime percall filename:lineno(function)
192 9.411 0.049 9.411 0.049 {skeletor.cython.particle_push.boris_push}
193 6.843 0.035 6.843 0.035 {skeletor.cython.deposit.deposit}
192 1.274 0.007 1.274 0.007 {skeletor.cython.particle_boundary.calculate_ihole}
192 1.248 0.007 1.248 0.007 {skeletor.cython.particle_boundary.periodic_x}
80/32 0.255 0.003 0.447 0.014 {built-in method _imp.create_dynamic}
2 0.177 0.089 0.177 0.089 {method 'normal' of 'mtrand.RandomState' objects}
1 0.093 0.093 0.093 0.093 example/landau_ions.py:62(ux_an)
1 0.086 0.086 0.089 0.089 /Users/berlok/codes/skeletor/skeletor/particles.py:62(initialize)
ncalls tottime percall cumtime percall filename:lineno(function)
192 4.553 0.024 4.553 0.024 {skeletor.cython.particle_push.boris_push}
193 3.260 0.017 3.260 0.017 {skeletor.cython.deposit.deposit}
192 1.150 0.006 1.150 0.006 {skeletor.cython.particle_boundary.calculate_ihole}
192 0.721 0.004 0.721 0.004 {skeletor.cython.particle_boundary.periodic_x}
80/32 0.253 0.003 0.435 0.014 {built-in method _imp.create_dynamic}
2 0.177 0.089 0.177 0.089 {method 'normal' of 'mtrand.RandomState' objects}
1 0.094 0.094 0.094 0.094 example/landau_ions.py:62(ux_an)
1 0.085 0.085 0.088 0.088 /Users/berlok/codes/skeletor/skeletor/particles.py:62(initialize)
ncalls tottime percall cumtime percall filename:lineno(function)
192 2.803 0.015 2.803 0.015 {skeletor.cython.particle_push.boris_push}
193 1.979 0.010 1.979 0.010 {skeletor.cython.deposit.deposit}
192 1.158 0.006 1.158 0.006 {skeletor.cython.particle_boundary.calculate_ihole}
192 0.628 0.003 0.628 0.003 {skeletor.cython.particle_boundary.periodic_x}
80/32 0.255 0.003 0.436 0.014 {built-in method _imp.create_dynamic}
2 0.182 0.091 0.182 0.091 {method 'normal' of 'mtrand.RandomState' objects}
I have started using cProfile to look at the speedup of various functions. The command is
I have also used the guide found here. Simply do the following:
One thread
Two threads
Four threads