nbia-astro / skeletor

Parallel PIC code written in Python and based on the skeleton codes provided by PICKSC
GNU General Public License v3.0
1 stars 0 forks source link

Openmp #62

Closed tberlok closed 7 years ago

tberlok commented 7 years ago

I have started using cProfile to look at the speedup of various functions. The command is

make OMPI_CC=gcc-6
export OMP_NUM_THREADS=1
python -O -m cProfile -s time example/landau_ions.py

I have also used the guide found here. Simply do the following:

python -O -m cProfile -o output.profile example/landau_ions.py
python -m pstats output.profile
sort time
stats 8

One thread

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      192    9.285    0.048    9.285    0.048 {skeletor.cython.particle_push.boris_push}
      193    6.814    0.035    6.814    0.035 {skeletor.cython.deposit.deposit}
      192    1.269    0.007    1.269    0.007 {skeletor.cython.particle_boundary.calculate_ihole}
      192    1.248    0.007    1.248    0.007 {skeletor.cython.particle_boundary.periodic_x}
    80/32    0.267    0.003    0.465    0.015 {built-in method _imp.create_dynamic}
        2    0.177    0.088    0.177    0.088 {method 'normal' of 'mtrand.RandomState' objects}
        1    0.093    0.093    0.093    0.093 landau_ions.py:62(ux_an)
        1    0.086    0.086    0.089    0.089 particles.py:62(initialize)

Two threads

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      192    4.788    0.025    4.788    0.025 {skeletor.cython.particle_push.boris_push}
      193    3.273    0.017    3.273    0.017 {skeletor.cython.deposit.deposit}
      192    1.156    0.006    1.156    0.006 {skeletor.cython.particle_boundary.calculate_ihole}
      192    0.727    0.004    0.727    0.004 {skeletor.cython.particle_boundary.periodic_x}
    80/32    0.258    0.003    0.443    0.014 {built-in method _imp.create_dynamic}
        2    0.178    0.089    0.178    0.089 {method 'normal' of 'mtrand.RandomState' objects}
        1    0.095    0.095    0.095    0.095 example/landau_ions.py:62(ux_an)
        1    0.087    0.087    0.087    0.087 example/landau_ions.py:66(uy_an)

Four threads

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      192    3.573    0.019    3.573    0.019 {skeletor.cython.particle_push.boris_push}
      193    2.296    0.012    2.296    0.012 {skeletor.cython.deposit.deposit}
      192    1.253    0.007    1.253    0.007 {skeletor.cython.particle_boundary.calculate_ihole}
      192    0.656    0.003    0.656    0.003 {skeletor.cython.particle_boundary.periodic_x}
    80/32    0.262    0.003    0.471    0.015 {built-in method _imp.create_dynamic}
        2    0.178    0.089    0.178    0.089 {method 'normal' of 'mtrand.RandomState' objects}
        1    0.092    0.092    0.092    0.092 example/landau_ions.py:62(ux_an)
        1    0.085    0.085    0.088    0.088 /Users/berlok/codes/skeletor/skeletor/particles.py:62(initialize)
tberlok commented 7 years ago

As you can see in the data above, we do not get as good a scaling as in the cython-omp repo.

tberlok commented 7 years ago

Update

I changed the code such that we now use Np in prange where cdef int Np = particles.shape[0]. This lead to a very good speed up and also better scaling.

1

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      192    9.411    0.049    9.411    0.049 {skeletor.cython.particle_push.boris_push}
      193    6.843    0.035    6.843    0.035 {skeletor.cython.deposit.deposit}
      192    1.274    0.007    1.274    0.007 {skeletor.cython.particle_boundary.calculate_ihole}
      192    1.248    0.007    1.248    0.007 {skeletor.cython.particle_boundary.periodic_x}
    80/32    0.255    0.003    0.447    0.014 {built-in method _imp.create_dynamic}
        2    0.177    0.089    0.177    0.089 {method 'normal' of 'mtrand.RandomState' objects}
        1    0.093    0.093    0.093    0.093 example/landau_ions.py:62(ux_an)
        1    0.086    0.086    0.089    0.089 /Users/berlok/codes/skeletor/skeletor/particles.py:62(initialize)

2

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      192    4.553    0.024    4.553    0.024 {skeletor.cython.particle_push.boris_push}
      193    3.260    0.017    3.260    0.017 {skeletor.cython.deposit.deposit}
      192    1.150    0.006    1.150    0.006 {skeletor.cython.particle_boundary.calculate_ihole}
      192    0.721    0.004    0.721    0.004 {skeletor.cython.particle_boundary.periodic_x}
    80/32    0.253    0.003    0.435    0.014 {built-in method _imp.create_dynamic}
        2    0.177    0.089    0.177    0.089 {method 'normal' of 'mtrand.RandomState' objects}
        1    0.094    0.094    0.094    0.094 example/landau_ions.py:62(ux_an)
        1    0.085    0.085    0.088    0.088 /Users/berlok/codes/skeletor/skeletor/particles.py:62(initialize)

4

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      192    2.803    0.015    2.803    0.015 {skeletor.cython.particle_push.boris_push}
      193    1.979    0.010    1.979    0.010 {skeletor.cython.deposit.deposit}
      192    1.158    0.006    1.158    0.006 {skeletor.cython.particle_boundary.calculate_ihole}
      192    0.628    0.003    0.628    0.003 {skeletor.cython.particle_boundary.periodic_x}
    80/32    0.255    0.003    0.436    0.014 {built-in method _imp.create_dynamic}
        2    0.182    0.091    0.182    0.091 {method 'normal' of 'mtrand.RandomState' objects}