ratt-ru / pfb-imaging

Preconditioned forward/backward clean algorithm
MIT License
6 stars 5 forks source link

Profile with yappi #9

Closed sjperkins closed 2 years ago

sjperkins commented 3 years ago

@landmanbester I did some profiling with yappi. See below the first 10 or so functions that take up the most time:

Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg      
..ges/dask/local.py:212 execute_task  89     0.001251  6.943522  0.078017
..ages/dask/core.py:86 _execute_task  171..  0.014375  6.940461  0.004047
..n.py:979 SubgraphCallable.__call__  72     0.001356  6.923556  0.096160
..site-packages/dask/core.py:130 get  73     0.001483  6.922226  0.094825
..tors.py:473 _hdot_internal_wrapper  73     0.029227  6.901768  0.094545
..fb/operators.py:449 _hdot_internal  72     0.362663  6.872542  0.095452
..g/pywt/_multilevel.py:179 wavedec2  64     0.005661  6.011292  0.093926
..6_64.egg/pywt/_multidim.py:24 dwt2  320    0.008139  5.988174  0.018713
.._64.egg/pywt/_multidim.py:121 dwtn  320    0.020201  5.940784  0.018565
..unction__ internals>:2 concatenate  64     0.000741  0.276983  0.004328
../pfb/operators.py:503 DaskPSI.hdot  1      0.000033  0.274248  0.274248
..ges/dask/base.py:142 Array.compute  1      0.007124  0.187318  0.187318

Selecting out the _hdot_internal and wavelet calls:

name                                  ncall  tsub      ttot      tavg      
..fb/operators.py:449 _hdot_internal  72     0.362663  6.872542  0.095452
..g/pywt/_multilevel.py:179 wavedec2  64     0.005661  6.011292  0.093926
..6_64.egg/pywt/_multidim.py:24 dwt2  320    0.008139  5.988174  0.018713
.._64.egg/pywt/_multidim.py:121 dwtn  320    0.020201  5.940784  0.018565
..unction__ internals>:2 concatenate  64     0.000741  0.276983  0.004328

Out of 6.94s total time,

  1. 6.87 seconds is spent in 72 _hdot_internal calls, which then spends
  2. 6.01 seconds in 64 wavedec2 calls which then spends
  3. 5.98 seconds in 320 dwt2 calls which then spends
  4. 5.94 seconds in 320 dwtn calls.

The next most expensive call is concatenate which takes up 0.27 seconds.

I interpret this as the majority of time spent in dwtn, especially here:

https://github.com/PyWavelets/pywt/blob/db0172a8ea261064bbc2f0a7b26759c6a8f71d76/pywt/_multidim.py#L185-L191

So while PyWavelets does drop the GIL, the quantity of work given to each thread may not be sufficient to fully exercise the cores.

sjperkins commented 3 years ago

I've marked this PR as a draft as it shouldn't be merged -- It exists to discuss profiling, rather than modifying functionality.

landmanbester commented 3 years ago

Interesting. Have you tried profiling the non-dask versions of the functions? They actually run significantly faster than the dask versions

sjperkins commented 3 years ago

Interesting. Have you tried profiling the non-dask versions of the functions? They actually run significantly faster than the dask versions

Yes, I can see this on my side. I probably should have been clearer. The kicker is here:

So while PyWavelets does drop the GIL, the quantity of work given to each thread may not be sufficient to fully exercise the cores.

There are 64 calls to _hdot_internal, which results in 320 calls to dwtn (320/64 == 5 levels). Then, within dwtn there are further loops over the data. I don't think the cython functions are being given enough work todo when the GIL is dropped. Therefore, everything ends up serialised, or worse.

sjperkins commented 3 years ago

A numba wavelet implementation may be required.

landmanbester commented 3 years ago

Ah man, that is not what I wanted to hear. If that was the case we should see the fraction

(time taken by dask implementation)/(time taken by serial implementation)

decrease with problem size right? I never tested this but I'll have a look. Thanks @sjperkins

sjperkins commented 3 years ago

Ah man, that is not what I wanted to hear. If that was the case we should see the fraction

(time taken by dask implementation)/(time taken by serial implementation)

decrease with problem size right? I never tested this but I'll have a look. Thanks @sjperkins

Hmmmm, I'm not sure.

But just to confirm what I'm saying about the data sizes not being large enough, I put a print(subband, x.shape) here and it produces the following shapes:

dwt_axis data sizes ```bash (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) d (512, 1024) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) Serial decomp took 3.689335823059082 Serial rec took 2.6806857585906982 (1024, 1024) (1024, 1024) a (512, 1024) (1024, 1024) d (512, 1024) a (512, 1024) (512, 512) (1024, 1024) a (256, 512) a (512, 1024) (1024, 1024) d (256, 512) (256, 256) a (512, 1024) a (128, 256) d (512, 1024) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) d (512, 1024) a (512, 1024) (1024, 1024) d (512, 1024) (512, 512) (512, 512) a (256, 512) a (512, 1024) a (256, 512) (1024, 1024) d (256, 512) d (256, 512) (512, 512) a (512, 1024) d (512, 1024) (256, 256) (256, 256) a (128, 256) a (128, 256) d (128, 256) (128, 128) d (128, 256) a (256, 512) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (512, 1024) d (256, 512) d (512, 1024) (256, 256) (512, 512) a (128, 256) d (128, 256) d (512, 1024) (128, 128) a (64, 128) a (256, 512) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) d (256, 512) d (512, 1024) (256, 256) a (256, 512) a (128, 256) d (128, 256) (128, 128) (512, 512) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (256, 512) a (256, 512) (256, 256) a (128, 256) (512, 512) d (128, 256) d (256, 512) (128, 128) a (64, 128) d (64, 128) (256, 256) (64, 64) a (32, 64) a (256, 512) d (32, 64) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (256, 512) d (64, 128) (256, 256) (64, 64) a (128, 256) a (32, 64) d (32, 64) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) (1024, 1024) a (512, 1024) (1024, 1024) d (512, 1024) a (512, 1024) (1024, 1024) (1024, 1024) (1024, 1024) a (512, 1024) (512, 512) d (512, 1024) a (256, 512) d (512, 1024) (1024, 1024) a (512, 1024) d (256, 512) a (512, 1024) d (512, 1024) (256, 256) a (128, 256) d (128, 256) a (512, 1024) (128, 128) a (64, 128) d (64, 128) (512, 512) d (512, 1024) (64, 64) a (32, 64) d (32, 64) a (256, 512) d (256, 512) d (512, 1024) (1024, 1024) (256, 256) (512, 512) (512, 512) a (128, 256) d (512, 1024) a (256, 512) d (256, 512) a (256, 512) d (128, 256) (256, 256) (128, 128) a (128, 256) (512, 512) d (256, 512) (512, 512) d (128, 256) a (64, 128) d (64, 128) a (256, 512) (512, 512) (128, 128) a (512, 1024) (64, 64) a (64, 128) a (32, 64) d (64, 128) (64, 64) (256, 256) a (256, 512) d (256, 512) d (32, 64) a (128, 256) a (32, 64) d (32, 64) d (128, 256) d (256, 512) a (256, 512) (256, 256) (128, 128) a (64, 128) a (128, 256) d (64, 128) (256, 256) d (128, 256) (64, 64) a (32, 64) d (32, 64) a (128, 256) (128, 128) d (256, 512) a (64, 128) d (64, 128) d (128, 256) (64, 64) a (32, 64) d (512, 1024) (256, 256) (128, 128) d (32, 64) a (64, 128) a (128, 256) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) (1024, 1024) (1024, 1024) a (512, 1024) (1024, 1024) a (512, 1024) a (512, 1024) (1024, 1024) d (512, 1024) (1024, 1024) d (512, 1024) a (512, 1024) (512, 512) d (512, 1024) a (512, 1024) a (256, 512) d (256, 512) (512, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (512, 1024) (1024, 1024) a (256, 512) (512, 512) d (512, 1024) d (256, 512) a (256, 512) (1024, 1024) (256, 256) d (512, 1024) d (256, 512) a (512, 1024) a (128, 256) (512, 512) d (512, 1024) (256, 256) d (128, 256) (128, 128) a (128, 256) a (64, 128) d (64, 128) (64, 64) a (32, 64) a (256, 512) d (32, 64) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (256, 512) (512, 512) a (512, 1024) d (512, 1024) (256, 256) a (256, 512) (512, 512) a (128, 256) d (128, 256) d (256, 512) (128, 128) a (256, 512) a (64, 128) d (64, 128) (256, 256) (64, 64) a (128, 256) a (32, 64) d (256, 512) (512, 512) d (128, 256) d (512, 1024) (256, 256) d (32, 64) a (256, 512) (128, 128) a (128, 256) a (64, 128) d (64, 128) d (128, 256) (64, 64) d (256, 512) a (32, 64) d (32, 64) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (256, 256) a (128, 256) (512, 512) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (512, 1024) (1024, 1024) d (512, 1024) a (512, 1024) (1024, 1024) (512, 512) (1024, 1024) (1024, 1024) a (256, 512) a (512, 1024) d (512, 1024) (1024, 1024) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) (512, 512) a (512, 1024) a (512, 1024) a (512, 1024) a (256, 512) a (512, 1024) d (512, 1024) d (256, 512) d (512, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) d (512, 1024) d (512, 1024) a (256, 512) d (512, 1024) d (256, 512) (256, 256) a (128, 256) d (128, 256) (512, 512) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (256, 512) (512, 512) (512, 512) (1024, 1024) (512, 512) d (256, 512) a (256, 512) a (256, 512) (256, 256) a (512, 1024) a (256, 512) a (128, 256) d (256, 512) d (256, 512) d (256, 512) (256, 256) (256, 256) d (128, 256) a (128, 256) (128, 128) d (128, 256) a (64, 128) d (64, 128) (128, 128) a (64, 128) (64, 64) a (32, 64) d (32, 64) d (64, 128) d (512, 1024) a (128, 256) (256, 256) a (128, 256) (64, 64) a (32, 64) d (32, 64) d (128, 256) d (128, 256) (128, 128) (128, 128) a (64, 128) a (64, 128) d (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (64, 64) (512, 512) a (256, 512) d (256, 512) a (32, 64) d (32, 64) (256, 256) (1024, 1024) a (128, 256) a (512, 1024) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) d (512, 1024) a (512, 1024) (512, 512) a (256, 512) d (256, 512) d (512, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) a (256, 512) (1024, 1024) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (512, 1024) (1024, 1024) (1024, 1024) d (512, 1024) a (512, 1024) (1024, 1024) a (512, 1024) (512, 512) (1024, 1024) a (256, 512) (1024, 1024) d (256, 512) a (512, 1024) d (512, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (512, 1024) d (512, 1024) (512, 512) a (512, 1024) d (512, 1024) a (256, 512) (512, 512) d (256, 512) (1024, 1024) a (256, 512) d (512, 1024) (256, 256) a (128, 256) d (128, 256) d (256, 512) (128, 128) a (64, 128) (512, 512) d (64, 128) (64, 64) a (32, 64) d (32, 64) (256, 256) a (128, 256) d (512, 1024) d (128, 256) a (256, 512) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (256, 512) (512, 512) (1024, 1024) a (512, 1024) (256, 256) a (256, 512) (512, 512) a (128, 256) d (128, 256) a (256, 512) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (256, 512) d (256, 512) (256, 256) a (128, 256) (256, 256) d (128, 256) a (128, 256) d (512, 1024) (128, 128) a (64, 128) d (64, 128) d (128, 256) a (512, 1024) (64, 64) a (32, 64) d (32, 64) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) a (256, 512) d (256, 512) d (512, 1024) (256, 256) a (128, 256) (1024, 1024) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) a (512, 1024) a (256, 512) d (256, 512) d (512, 1024) (256, 256) (1024, 1024) (512, 512) a (256, 512) d (256, 512) a (512, 1024) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (1024, 1024) (64, 64) a (32, 64) d (32, 64) (256, 256) a (128, 256) d (128, 256) d (512, 1024) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) a (512, 1024) d (32, 64) (1024, 1024) (512, 512) a (256, 512) d (256, 512) d (512, 1024) (256, 256) a (128, 256) d (128, 256) a (512, 1024) (128, 128) a (64, 128) d (64, 128) (1024, 1024) (64, 64) a (32, 64) d (32, 64) (512, 512) a (512, 1024) d (512, 1024) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (1024, 1024) (64, 64) (512, 512) a (32, 64) d (32, 64) d (512, 1024) a (256, 512) d (256, 512) (256, 256) a (128, 256) a (512, 1024) d (128, 256) (512, 512) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) a (256, 512) d (256, 512) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (512, 1024) a (512, 1024) (512, 512) a (256, 512) d (512, 1024) d (256, 512) (1024, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (512, 512) (1024, 1024) a (256, 512) d (256, 512) (1024, 1024) a (512, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) a (512, 1024) a (512, 1024) (1024, 1024) d (512, 1024) d (512, 1024) d (512, 1024) (512, 512) a (512, 1024) (512, 512) (1024, 1024) a (256, 512) a (256, 512) (512, 512) d (256, 512) (256, 256) a (256, 512) (1024, 1024) a (128, 256) d (128, 256) d (256, 512) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (512, 1024) (256, 256) a (128, 256) d (256, 512) a (512, 1024) d (128, 256) (128, 128) a (64, 128) d (64, 128) (256, 256) (64, 64) a (32, 64) d (32, 64) a (128, 256) a (512, 1024) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) (512, 512) a (32, 64) d (32, 64) a (256, 512) d (512, 1024) d (512, 1024) (1024, 1024) d (256, 512) (256, 256) a (128, 256) (512, 512) d (128, 256) (128, 128) a (64, 128) d (64, 128) (512, 512) (64, 64) a (32, 64) d (32, 64) a (512, 1024) a (256, 512) a (256, 512) (1024, 1024) d (256, 512) d (512, 1024) (256, 256) d (256, 512) a (128, 256) d (128, 256) (256, 256) (128, 128) a (64, 128) d (64, 128) a (128, 256) (64, 64) a (32, 64) d (32, 64) d (128, 256) (128, 128) a (64, 128) d (64, 128) a (512, 1024) (64, 64) a (32, 64) d (32, 64) (512, 512) a (256, 512) d (256, 512) (256, 256) d (512, 1024) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) (1024, 1024) (1024, 1024) (512, 512) a (256, 512) a (512, 1024) (1024, 1024) d (256, 512) (256, 256) a (128, 256) d (128, 256) a (512, 1024) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (512, 1024) a (512, 1024) (1024, 1024) d (512, 1024) d (512, 1024) (512, 512) a (256, 512) (512, 512) d (256, 512) (1024, 1024) (512, 512) a (256, 512) (256, 256) a (512, 1024) a (128, 256) (1024, 1024) d (128, 256) (128, 128) a (64, 128) a (256, 512) d (64, 128) (64, 64) d (256, 512) a (32, 64) d (32, 64) a (512, 1024) (256, 256) d (256, 512) a (128, 256) d (128, 256) (128, 128) a (64, 128) (256, 256) d (64, 128) (64, 64) a (128, 256) a (32, 64) d (32, 64) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) d (512, 1024) d (512, 1024) a (512, 1024) (1024, 1024) (512, 512) (512, 512) a (512, 1024) a (256, 512) d (256, 512) a (256, 512) d (512, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) d (256, 512) a (32, 64) d (32, 64) d (512, 1024) (256, 256) a (128, 256) d (128, 256) (128, 128) a (64, 128) d (64, 128) (64, 64) (512, 512) a (32, 64) d (32, 64) (512, 512) a (256, 512) a (256, 512) d (256, 512) d (256, 512) (256, 256) a (128, 256) (256, 256) d (128, 256) a (128, 256) (128, 128) a (64, 128) d (128, 256) d (64, 128) (64, 64) a (32, 64) (128, 128) d (32, 64) a (64, 128) d (64, 128) (64, 64) a (32, 64) d (32, 64) ```

Unfortunately, I think at most the cython is given 8MB of data to chew on, and a lot of the time it's much less than that. It's not really possible to exercise the cores if they don't have sufficient work to do, even if the GIL is dropped.

landmanbester commented 3 years ago

That makes me very sad. It might be simpler to wrap existing libraries than start from scratch though. This http://wavelet2d.sourceforge.net/ also has the Daubechies filters. Not sure how difficult they would be to wrap though

sjperkins commented 3 years ago

That makes me very sad. It might be simpler to wrap existing libraries than start from scratch though. This http://wavelet2d.sourceforge.net/ also has the Daubechies filters. Not sure how difficult they would be to wrap though

It may be easier to wrap a pure C implementation. Any thoughts on the suitability of the following from a correctness POV?

https://github.com/rafat/wavelib https://github.com/rafat/wavelib/wiki/DWT-Example-Code

http://www.wavelets.org/software.php

There's also the GNU Scientific Library (GSL)

https://www.gnu.org/software/gsl/doc/html/dwt.html

which appears to have Python wrappers:

https://pypi.org/project/pygsl/

landmanbester commented 3 years ago

pygsl would have been great but according to the docs:

The library provides functions to perform two-dimensional discrete wavelet transforms on square matrices. The matrix dimensions must be an integer power of two.

which is a bit of a severe limitation. I'll have a look at some of the other packages and get back to you. A quick glance at the first package looks promising. They have the fast discrete wavelet transforms we need but I haven't checked if they also have some of the above limitations

landmanbester commented 3 years ago

Also, see the demo here

https://github.com/PyWavelets/pywt/pull/230/commits/5c5d6d9b3a1ff8ce905e5d0be7430734cf0d0a85

It looks like they get some speed up using concurrent.futures so maybe we shouldn't throw the towel in just yet

sjperkins commented 3 years ago

Also, see the demo here

PyWavelets/pywt@5c5d6d9

It looks like they get some speed up using concurrent.futures so maybe we shouldn't throw the towel in just yet

I see they've got a 3D wavelet transform. Do you think the current code could be modified to use wave2recn? That would improve the amount of work given to cython.

landmanbester commented 3 years ago

Yes, I don't see why not. I'm also wondering how much the wavelet decomposition level has to do with it because the individual blocks get smaller with increasing decomposition level, leaving less work per thread