kwikteam / klustakwik2

Fast software for high-dimensional cluster analysis using the masked EM algorithm for Gaussians mixtures
BSD 3-Clause "New" or "Revised" License
27 stars 13 forks source link

[MRG] Compiling cython from setup.py #52

Closed thesamovar closed 9 years ago

thesamovar commented 9 years ago

This is not ready to merge yet.

@rossant, if you have a minute, could you see if you can work out why this isn't working? I get the error error: each element of 'ext_modules' option must be an Extension instance or 2-tuple when running python setup.py build_ext --inplace.

Fixes #42.

thesamovar commented 9 years ago

I think maybe the Travis configuration needs to be modified too? @nippoo?

nippoo commented 9 years ago

Ah. I'm not actually installing the program in Travis, I'm just running nosetests on the source. Should python setup.py work? Or do I need to use build_ext --inplace too?

rossant commented 9 years ago

for phy i'll just need for pip install klustakwik2 to "just work", ie it should compile automatically

nippoo commented 9 years ago

@rossant we should pip install from git though - in the travis tests on phy. Or, better, do both...

thesamovar commented 9 years ago

I don't really know how Travis works but in principle a standard python setup should work.

nippoo commented 9 years ago

@thesamovar I've pushed a commit to change travis now, so once you fix your bug, the build should pass...

thesamovar commented 9 years ago

OK, pushed changes which should hopefully fix this. It runs on Windows anyway. In case anyone googling this manages to find this post in the future, some notes on how this was fixed.

The error with error: each element of 'ext_modules' option must be an Extension instance or 2-tuple was fixed by moving the import from Cython.Build import cythonize after the import of setuptools. This turns out to be important because setuptools does some monkey patching clever stuff, so it has to be imported before distutils, and Cython.Build imports distutils. Yeah. I worked that out.

I then got another error with the numpy path because although the Cython docs tell you to write cythonize('*.pyx', include_path=[numpy.get_include()]) this doesn't work. You need to write setup(..., cythonize('*.pyx'), include_dirs=[numpy.get_include()], ...) because it's the setup function that needs those include dirs, not the cythonize function.

And finally, there was another erorr because I forgot to include #distutils: language=c++ in one of the .pyx files.

Yes I want a medal.

thesamovar commented 9 years ago

By the way @nippoo, it's possible that it will now be smart about OpenMP and just work on Mac - could you try it? The reason is that I'm no longer handling creating the compile arguments, Cython is (using that form of cythonize), and it is smarter than I am. So if we're lucky they considered this case and it will just work.

thesamovar commented 9 years ago

Oh wait, no: it seems that with these changes it is no longer using OpenMP on my Windows machine, so something got lost somewhere.

thesamovar commented 9 years ago

OK the Travis build is failing for a reason that I don't understand. It complains that one of the Cython modules doesn't exist, but looking at the output it seems to have been compiled fine. Maybe this is because it is installing the package and copying the built .so files to the install directory, but then you're running nosetests from the source directory?

nippoo commented 9 years ago

Aha - this works on Mac! So this PR now [edit: does not do anything, see Dan's later comment] when it gets merged.

nippoo commented 9 years ago

re: the Travis issue, perhaps setup.py develop might work then? Any thoughts @rossant ?

thesamovar commented 9 years ago

No, it doesn't solve the mac problem unfortunately because it seems to have disabled openmp on all platforms...

nippoo commented 9 years ago

Oops.

thesamovar commented 9 years ago

Things are getting more and more mysterious. So it turns out that with Anaconda on Windows 64 bit, it includes the 64 bit version of the inappropriately named MinGW-32, but for some reason doesn't come with OpenMP support. Installing with the setup script is using this compiler by default and therefore I can't get OpenMP support on Windows. However, pyximport that I used to be using uses MSVC which does allow for OpenMP support. All very strange.

thesamovar commented 9 years ago

OK, another update. There are various ways we can approach this problem, but nothing is entirely satisfactory. I'll explain what I have done, and what the alternatives are.

What I've done is to tailor it to build and install:

I figure that most of our users at the moment are on Linux so that should just work which it does. For Windows we will provide binaries at some point in the future, so it's not a problem that installation is a bit of a hassle at the moment. For Mac, well... what can you do with Macs really?

Other options include:

Finally, we may be able to avoid all this pain if we switch to using numba instead of Cython in the future. Although with the current state of Numba that puts some limits on what sorts of parallelism are feasible. I'm not sure if you can used thread-based parallelism with numba or not, but if it only lets you use process-based parallelism then we run into memory issues (we'd need to duplicate the memory used once for each CPU). If it allows thread-based parallelism, it's OK for memory but it still means that we probably can't efficiently parallelise over spikes, only over clusters, which limits the speed gains we can expect to achieve.

@nippoo @rossant what are your feelings on these issues?

Also, if we're happy with my solution then it's probably good to merge once we adapt Travis to install correctly (which I leave to you guys to do).

nippoo commented 9 years ago

Aren't we moving to IPython.parallel rather than OpenMP? Or both at the same time? On most HPC clusters, I imagine trying to support multi-thread as well as multiprocess parallelism seems tricky, but perhaps there's a way around it.

I'd quite like it to work on Mac if possible, though I am biased ;). Seriously though, phy supports Mac/Win/Linux and it seems a bit sad to drop support for Mac for the sake of one bug.

Could you attempt to pass an --fopenmp flag to the compiler for a trivial program, check the return code, and if it compiles successfully then do the whole thing with OpenMP? Rather than maintaining a database of compatible platforms, just detect OpenMP support somehow at build-time...

rossant commented 9 years ago

I'd vote for whatever solution is the fastest to set up now. With your current solution, can we set up an automated build system for all platforms that disables OpenMP on Macs? Having a complicated setup for compilation is fine as long as we have an automated system that builds binaries on all platforms.

thesamovar commented 9 years ago

The ideal is to use multi-thread and multi-process parallelism. Each separate machine would have one process which would use multiple threads because this will utilise memory and CPU in the most efficient way. I don't know if that's something that would map to an HPC cluster or not though, I don't have much experience of that. If you drop threading support though and only use multiprocess parallelism then you need to either deal with the complexities of sharing memory between processes (not possible on Windows I think, and complicated on any platform) or have RAM usage N times higher than it needs to be on one machine (where N is the number of CPUs).

For Mac, I think it should be doable but I don't want to work on it myself. Perhaps that could be a separate issue? In any case, you can get OpenMP support on Mac by just installing gcc rather than using clang, right?

I think your solution would work (it's similar to what I was proposing in the second of my two alternative options), but I don't know how to write a setup.py script that does that. If you want to give it a go, I'm very happy to use that option (even though it means that OpenMP support would probably be off by default on Windows because the default compiler that Anaconda uses doesn't support it). All you need to do is to modify setup.py so that it detects if OpenMP is available, and if it isn't available, then rewrite e_step_cy.pyx to remove the two commented lines at the top that mention -fopenmp.

thesamovar commented 9 years ago

Well the fastest to set up now is the one I've already done, but it means no Mac support. All that's needed for Travis to pass is to update the configuration so that it builds correctly...

Disabling OpenMP is non-trivial since, as far as I can tell, the only way is to remove the comments in the .pyx source file with the extra compile and link arguments, or to not use cythonize in the way I'm doing it but to write multiple Extension objects by hand (and once for each platform, Windows, Mac, Linux). I'm happy to go along with that solution, but as I said, I don't personally want to write it at the moment so that means one of you two volunteering.

thesamovar commented 9 years ago

@nippoo I added a script that when you run it modifies the source files to switch OpenMP off. So to build it, you just have to run this script before building and installing. Reasonable compromise? I think it's better not to release a version that doesn't have OpenMP on for Windows by default because in most cases, OpenMP support is essential for realistic, practical use. It's only for developing phy that you might want to use it without OpenMP support.

rossant commented 9 years ago

is it expected that we'll only run this script when building a release on macs? so that windows and linux users always have openmp enabled

thesamovar commented 9 years ago

Exactly.

On 3 June 2015 19:58:26 BST, Cyrille Rossant notifications@github.com wrote:

is it expected that we'll only run this script when building a release on macs? so that windows and linux users always have openmp enabled


Reply to this email directly or view it on GitHub: https://github.com/kwikteam/klustakwik2/pull/52#issuecomment-108575863

thesamovar commented 9 years ago

Ah, so the issue with Travis is that the setup.py file doesn't work on linux for reasons I don't yet understand. It works fine on windows either using msvc or mingw32 (which ought to be linux-like).

thesamovar commented 9 years ago

OK great I got Travis to build it. Unfortunately, in its current state, OpenMP support will be disabled for Windows and it won't build on Mac at all. I guess this needs to be fixed before we merge this.

For reference, I was using a trick that worked on Windows and not on Linux. It seems that on Windows if you pass -fopenmp to MSVC it gets magically converted to /fopenmp and ignored, so I was hoping that I could specify compile arguments as -fopenmp -openmp and it would get converted to /fopenmp /openmp on Windows, which it does, and it works fine. Unfortunately, gcc won't ignore the -openmp but interprets it as -o penmp which changes the output object file, messing up the compilation.

I've had a bit of a google but I can't find anyone who has a way to write a setup.py for Cython with OpenMP that works on Windows and Linux (let alone Mac).

Still... some progress.

rossant commented 9 years ago

i had no idea it would be such a pain

thesamovar commented 9 years ago

I suspected it would. :)

nippoo commented 9 years ago

1819medal

Here, have a medal, anyway. :)

thesamovar commented 9 years ago

Finally! I asked for one like 100 comments ago. ;)

I think I may have found a cheeky hack that will let us do everything we wanted... working on it now.

thesamovar commented 9 years ago

@nippoo, could you see if the latest version with cheeky hack works on your Mac? If so, I think we have a winner!

nippoo commented 9 years ago

:(

26 warnings generated.
g++ -bundle -undefined dynamic_lookup -L/Users/nippoo/miniconda/envs/kk2-py27/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.5-x86_64-2.7/klustakwik2/numerics/cylib/e_step_cy.o -L/Users/nippoo/miniconda/envs/kk2-py27/lib -o build/lib.macosx-10.5-x86_64-2.7/klustakwik2/numerics/cylib/e_step_cy.so -fopenmp
ld: library not found for -lgomp
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'g++' failed with exit status 
thesamovar commented 9 years ago

Can you tell me the results of os.name and sys.platform on your mac?

thesamovar commented 9 years ago

Alright, one more batch of changes and I think this is the one. It should now be installable on all platforms using pip install klustakwik2. If you are on Linux you get multithreading. If you're on Windows, you get multithreading if MSVC is installed, otherwise not. On Mac you don't get multithreading. All of these behaviours are overrideable by running one of the following python setup.py install --with-openmp python setup.py install --with-no-openmp. On Windows, it will attempt to override the default behaviour (to use MinGW32 in preference to MSVC and therefore not get OpenMP support) by appending --compiler=msvc to sys.argv, and this can be stopped by doing python setup.py install --no-msvc.

In all cases, it will first attempt to run setup.py with OpenMP support and if that fails, try again without, so it should be reasonably robust. Having said that, I see that Travis builds are failing so I'll stop writing this comment and investigate.

thesamovar commented 9 years ago

OK it was something stupid, let's see if that does the trick.

@nippoo can you try on your mac again? It should work now I think.

nippoo commented 9 years ago

Boom! That did it. Nice one. (I'm afraid you've already got your medal though. One per day, max...)

thesamovar commented 9 years ago

Great! So it's working on Linux, Windows and Mac now! I think this is ready to be merged. I'll leave it for now so you guys can take a look before merging.

Also note that I updated the readme file with more detailed instructions.

thesamovar commented 9 years ago

Oh actually, @nippoo do you want to try to install gcc via something called Homebrew and see if you can get OpenMP support on Mac? And if so, maybe update the readme file to explain how you do it? It's not essential for this PR though, but it would be nice to have at some point.

thesamovar commented 9 years ago

Done and done.

thesamovar commented 9 years ago

Nice to be done with that.

thesamovar commented 9 years ago

Just FYI, there were bugs in installation for Windows, but these are now fixed.

It turns out that even though the default compiler on Windows is MSVC, Anaconda switches it to MinGW32 by default (which has no OpenMP support). So I removed some of the hacks I was using to try to force MSVC as the default compiler and instead just recommend that users disable Anaconda's override by running conda remove libpython before installing KK2. This seems simpler.