Offer: Port code to use numpy, one day pure Python

opticspy / lightpipes

LightPipes for Python, "Pure Python version"

https://opticspy.github.io/lightpipes/

BSD 3-Clause "New" or "Revised" License

233 stars 54 forks source link

Offer: Port code to use numpy, one day pure Python #32

Open ldoyle opened 4 years ago

ldoyle commented 4 years ago

Hi Fred (and others), I have been using LightPipes in Matlab and Python for a while now and it is a very useful package! There are however a couple of limitations of the currently available Python package:

Not using numpy (actually dependency is not necessary at all right now)
Instead, data is list of lists ("unpythonic")
FFT using fftw lib, creating a DLL dependency (better to use numpy)
most importantly: global variables inside package (e.g. wavelength). It is therefore impossible to switch around fields without making sure everything is re-initialized correctly (e.g. going back and forth from nearfield to focus, size will be wrong on one of the fields)

I have refactored and ported parts of the code so far as follows:

a Field class contains the complex field as .field, but also .wavelength and .size -> no need for global variables in package, all functions are fully static
inside that class, the field is not stored as list of list, but as 2d numpy array (complex)
some basic functions like Begin, CircAperture, Zernike implemented in pure python
Fresnel implemented in pure python. In 1:1 implementation of loops a lot slower, now I have refactored all loops to use numpy vectorized functions instead and it is actually faster than Cpp version!

In the future, we would also like to add the following features:

an optional function parameter specifying whether to copy the Field or modify it in-place (slight memory/performance gain)
using non-square grid (as seems to be interesting for more people here in the Github issues)

Summarizing, these changes will break the public interface and therefore backward compatibility of the package, and introduce a dependency to numpy. Would you be interested to incorporate the changes, anyway (and have the users update there code to fix compatibility), or should I create an own package name, eg. "PyLightPipes"?

Regards, Lenny

guyskk commented 4 years ago

Hi Idoyle, thank you for all the work! I think depends on numpy is not a problem because most people already installed it, but break public interface is a bit painful.

How about make Field class behavior like list of list, ie. implement __iter__, __next__, __len__, __getitem__, __contains__ and other frequently-used methods, so that it will not break users' code?

Is there other breaks I missed?

FredvanGoor commented 4 years ago

Hi guyskk and Idoyle, I propose to start a new branche in opticspy to investigate your ideas.

ldoyle commented 4 years ago

Hi guys, thanks for the quick reply. You can check out the code at https://github.com/ldoyle/lightpipes/tree/pythonicport I can also open a Pull Request so you can have it as a branch here, never done that before. Here is what I did so far:

Clean up file structure, there were some outdated and non-working examples in LightPipes/Examples, now all in Examples/ are tested and working
Fixed some minor bugs in the examples
Wrapped all LP functions to accept and return my new Field class which has Field.field (complex numpy array) and .size, .wavelength, .N, ._curvature (for LPLens)
Merged my ports of some stuff like Fresnel (pure python using numpy FFT), Zernike, CircAperture, ...
Added some new stuff like ZernikeFilter (subtract certain modes like an adaptive optic) and Plot
Tested 100% of the examples to give no errors and visually identical plot results as your current version. They all work without modification.

Adding list behaviour to the field class is an interesting idea, I will try this. This should be easy to do for reading (e.g. plotting the field), but harder probably for writing (changing the field data). However, as it turns out with the examples, code that does not directly manipulate F but only via Intensity, MultIntensity, SubPhase etc. is almost always immediately compatible.

My next steps will be:

Successively implement easy functions in pure python (e.g. apertures etc)
One by one port the Propagators to pure Python to remove fftw dependency. If it turns out they run slower than Cpp version, one can still use a .pyx compiled version but using numpy instead of fftw.
Finally remove need for compiling completely or at least with a simpler setup and dependency

In addition to the numpy dependency, right now I have:

scipy.special for fresnel() integrals
skimage for an alternative PhaseUnwrap, but not stricly necessary as the existing one works fine
matplotlib for the Plot function, but if you do not use Plot no error will occur if it is missing

FredvanGoor commented 4 years ago

Hi Idoyle, I installed your version of LightPipes using numpy and compared the execution time of the Fresnel command with a 3000 x 3000 grid dimension. It turns out that the "numpy" version is about 5 seconds slower than the "C++" version on my computer (windows 10). Fred

ldoyle commented 4 years ago

Hi Fred, I checked this and you are absolutely right. I hadn't done extensive tests, I only noticed that it seemed to be faster in some cases with larger grids. Sorry for the false news. Turns out there is something very interesting happening: When testing grid sizes divisible by 100 (nor sure why 100, maybe others work too) there is a clear trend that my version is ~1.5-2x slower. For other grid sizes both versions have a lot of jumps and e.g. the Cpp version takes 2x as long for N=1024 as it does for N=1000 or even N=1029! I tend to use e.g. 1024 or 2048 as grid size (powers of 2 are supposed to be efficient for FFT, so I thought this might help). Coincidentally, these numbers are the only cases where my code is indeed slightly faster. I created a script to compare both versions for a large range of N, you can see the result attached. The jumps around N=256, 512, 1024, 1500, 2048 are not because of powers of 2 I believe, but instead just because of the "non-regular" numbers, I expect the same to happen at e.g. 1880 or 1991 etc. (which is why I added the range 1490,1491,...1500, ...1510, see graph). times_py_vs_cpp2 I will upload the test script in my repo if you would like to experiment further yourself.

Summarizing, I think my initial goal is not to make the package more efficient, but rather improve readability and portability and use this as a Python exercise to see what can be gained. Best regards, Lenny

FredvanGoor commented 4 years ago

I asked my friend Gleb Vdovin and he said that it has to do something with prime numbers. I don't know the details but it is a fact that other people encountered as well: See for example https://blogs.mathworks.com/steve/2014/04/07/timing-the-fft/

Fred

FredvanGoor commented 4 years ago

Dear Lenny, I would like to invite you to become a "member" of the opticspy repository. Your work for the LightPipes project is very promising and I believe that the best way to proceed is to join us. Please let me know if you are interested. Best regards, Fred van Goor

ldoyle commented 4 years ago

Hi Fred, thanks for the Matlab FFT article, very insightful. I have implemented the same random number picker for testing the dependecy on N. Because it is faster and contains less FFTs, I benchmarked Forvard() and not Fresnel(). In my opinion for the 2D case the difference is less pronounced, but definitely there. (See image, log-log scale!) times_py_vs_cpp_forvard

In general it seems though that in Python a lot of "random" factors come in like if the console is fresh or re-run, the time between last execution (maybe my CPU is throttling or RAM usage and garbage collection, ...) and others, so sometimes one version is faster, sometimes the other.

To update on the progress:

Instead of numpy.fft, I am now using pyfftw which wraps fftw.dll just as you do. It does seem faster than numpy.fft, but it's really easy to replace the function calls, so users can decide to install pyfftw or just rely on the numpy routines without changing any code.
I optimized Fresnel() a little further, although not much more is to be gained with simple means. For the same settings, sometimes the Cpp version is faster, sometimes the pyfftw-enabled Python version. (Interestingly, enabling multithreading for pyfftw will also run the lib-fftw DLL in multithreaded mode on subsequent calls in the same Python instance, giving both a small boost over single-threaded default)
I ported Forward(). Because it is so inefficient I could only test it up to N=128, but here it is already a lot faster than the Cpp version, probably since I pre-calculate the Frensel integrals instead of re-calculating many redundant ones in the inner for-loops (at the cost of higher RAM usage I guess). However this function will not be of interest anyway for most people.
Forvard() is also ported and working. Here the speeds are also similar, I guess the trend in the graph above is mainly the multithreading which is on for pyfftw and off for Cpp version in this case.
The new progress is not in my fork, but the "PyLightPipes" branch in this repo.

This is not my main priority right now, so it might take some time to progress further, but feel free to comment on the next useful steps (also other people interested in re-using this). Cheers, Lenny

FredvanGoor commented 4 years ago

Hi Lenny, Very good work! I tested your updates in PyLightPipes of the Fresnel and Forvard commands. And they are much faster now than the original C++ versions! I found on my laptop: 4000 x 4000 grid: C++ Forvard 6,2 sec and Python Forvard 3,9 sec 2000 x 2000 grid: C++ Forvard 1,4 sec and Python Forvard 1,0 sec 2000 x 2000 grid: C++ Fresnel 6,2 sec and Python Fresnel 3,9 sec

About pyFFTW: When we make a wheel for pypi we can add in setup.py the install_requires keyword argument: install_requires=["numpy", "pyFFTW", "scipy"] These are all in PyPi. When a user pip installs LightPipes pip checks if numpy, pyFFTW and scipy are all installed. If not these packages will be installed as well. I will check the wheel-making. Fred

FredvanGoor commented 4 years ago

I tested the generation of a wheel with the "install_requires" keyword in setup.py mentioned in my previous message. It works as expected. After uninstalling numpy, pyFFTW, scipy and LightPipes from my computer and executing "pip install LightPipes-2.0.0-cp37-cp37m-win_amd64.whl", the missing packages were installed together with LightPipes. I have changed the version to: 2.0.0, because it is a major change going from C++ to pure Python. The lower execution time of PyLightPipes compared to C++ LightPipes must be due to multi-processor execution (this was not switched-on in the C++ version. Maybe that can be tried ...)

Fred

ldoyle commented 4 years ago

Hi Fred, thanks for testing the install. Once the port is complete, my hope is it will be enough to have 1 wheel which is platform-independent, so that will simplify things there, too. Do you think testing for Python 2 would be relevant? I would also say it is worth a new major version number. Would you plan to publish it still as LightPipes (pip install LightPipes) or rather rename it to have the old version available also? Personally I think keeping the name is fine, since users can use pip to install the old version any time and the backward compatibility is almost there, except some functions do not return a list of lists but a numpy array (the field variable F is not usually manipulated directly anyway, so for users using only SubIntensity/MultPhase/Intensity()/Phase() etc hardly anything changes).

Regarding the speed, I also think this must be the multithreading. For a more "fair" comparison you can disable it by setting threads=1 in the top of propagators.py. On my laptop like I said it's hard to tell which one is really faster (maybe I should repeat with a large N like 4000).

FredvanGoor commented 4 years ago

Hi Lenny, Python 2 is retired (like me), I don't think it is a good idea to spend time on a Python 2 version.

Once the new version 2.0.0 has been tested with at least all the examples it is best to publish it as "LightPipes". If there are (hopefully small) differences for the users we can mention that in the sphinx documents which must be updated as well in the future.

Even with 'threads': 1 in propagators.py, python Fresnel and Forvard are a little faster than the C++ versions. With 'threads': -1 The speed increases almost a factor 2 on my 4-core/8-threads laptop with an Intel Core i7-2630QM CPU (2GHz, 10Gb RAM)!

I added z=0 and z<=0 checks in Forvard and Fresnel respectively. That was also to test if my GitHub Desktop works to update the PyLightPipes branch. Please have a look if I did that correctly.... Greetings from, Fred.

FredvanGoor commented 4 years ago

I added the pure-python versions of the GaussAperture, GaussScreen, GaussHermite and GaussLaguerre commands to the PyLightPipes branch. Please have a look and do tests. Especially the GaussLaguerre command, because I found a difference with the C++ version for GaussLaguerre(0,1,.....). I think that the C++ GaussLaguerre routine is wrong! The pure-Python version looks OK. I also tested the GaussHermite and GaussLaguerre wavefront propagation with Fresnel and Forvard. Indeed they did not change their shapes as it should be. Greetings from, Fred

ldoyle commented 4 years ago

Dear Fred, thanks a lot for your support! I haven't had a chance to check the Gauss functions in detail, but they look OK. I will try to test them soon. In the meantime I already ported Convert, LensForvard and LensFresnel, I pushed the changes to git just now. If you have no more recent commits locally, you should be able to just git pull my changes. I slightly modified the z<=0 checks in Fresnel/Forward/Forvard: I think print('Error message') is not very pythonic, as it might be missed e.g. when using a GUI, and returning the unchanged field in this case makes little sense. Instead, for z<0 it raises an Exception which halts the user program or needs to be caught and treated accordingly. And for cases which are allowed but nonsense (z=0) it does not print any error and return a copy of the field. This is important since otherwise changing data in-place will also modify the original field:

#  user may want to keep F1 and do something different to it than F2:
F2 = Forvard(0, F1) # returns pointer to same field, no copy.
F2.size = 2*m
print(F1.size) # will also be 2, even if F1 was 1*m before

Of course, for 99% of the users this does not make a difference since you usually have just one F, e.g. F = Forvard(z, F) will overwrite the pointer anyway, but it's best to keep the behaviour consistent and always return a copy unless otherwise noted. If you want we can still add a print() or maybe better a warn.warning() to inform user about the z=0 case.

Now only Steps() and Interpol() to go, great! I fear these will be a lot slower in pure python though since they rely on many loops etc., I haven't checked yet how well they can be refactored to vectorized numpy notation (applying the same calculation element-wise to the entire array instead of for-loop)

FredvanGoor commented 4 years ago

Hi Lenny, I also did your test for Forvard and Fresnel on my 4-core, 8 threads Windows 10 laptop. And now the increase of speed is clear!

Forvard: Test Cpp vs Py Forvard Fresnel: Test Cpp vs Py Fresnel

About the interpol command: Maybe the scipy.interpolate command can be used?

Greetings, Fred