Simplify deployment for windows systems

coldfix commented 10 years ago

The Cython extension module requires the exact same python and numpy (!) build on the target system as was used to create the binary. For all practical purposes this makes deploying binary versions of cpymad almost impossible. Especially on windows systems (where often no compiler is available) this is a serious limitation.

At the moment, I can think of the following alternatives:

access MAD-X via ctypes replacing the cython module and (optionally) create extension of MAD-X as a pure C library containing a convenience layer with an API similar to the cython module.
drop only the compile-time dependency on numpy.

The dis-/advantages of both approaches are the following:

advantages:
- #(builds) = #(architectures), i.e. on windows systems we need only one build per machine architecture that can be used for all versions of python and windows
- setup.py gets much simpler, since it does not depend on Cython/numpy
- could integrate additional metadata in the extending C library such as MAD-X revision number, which is currently not included in the upstream version
disadvantages:
- performance will probably go down a bit. It remains to be examined how much influence this will have. It could be that it is hardly noticeable.
- ctypes API is a little more boilerplate to deal with that Cython
advantages:
- #(builds) = #(architectures) * #(python-versions), i.e. a stil manageable number of builds: one for each combination of python{2.6,2.7,3.3} with 32bit/64bit
- setup.py still gets a little simpler, since the numpy build dependency can be dropped
- possibly a little faster than the ctypes approach
- almost effortlessly patchable
disadvantages:
- still need many builds
status quo:
- #(builds) = #(architectures) * #(python-versions) * #(numpy-versions) unmanageable. numpy upgrades will break a working cpymad installation.

I might just try out these alternatives and post performance results here. Do you have any thoughts on this?

Best regards, Thomas

coldfix commented 10 years ago

Option 2 turns out to be an almost effortless improvement, see #82, with no performance impact. Option 1 could still be an interesting alternative for future versions, but I will postpone it for now as it requires much more work.

Eothred commented 10 years ago

Sorry for not commenting on all these issues, I've been a bit more busy this summer than expected. I am not familiar with ctypes but if the structure of the code is similarly simple as the cython then I don't really have any strong objections. The benefit I saw of cython generally (as I understood it) was that you could conveniently access numpy and move c objects into numpy objects (and back). And well, I just did not know of ctypes.

As for speed I am not very concerned. Maybe check for moving big tfs tables from mad-x into python objects etc. Other than that the heavy lifting is done internally in Mad-X anyway, so a few seconds here and there are hardly a problem. I'm all for reducing dependencies and simplifying things, that makes it easier to maintain and extend in the future.

I do consider the installation process to be very hard core at the moment (for people who are not programmers and/or don't know python from beforehand). Any ideas to make it cleaner/simpler are welcome.

I'm back in work mode from September. That is a new job so I will probably be busy, but maybe I'll have more time following up on the changes then somehow.

coldfix commented 10 years ago

Hey, welcome back,

You can view ctypes as a low-level mechanism for dynamic linking:

it has equivalents of dlopen/dlsym (POSIX) or LoadLibrary/GetProcAddress (WinAPI)
it provides python wrapper objects for C objects (based on their memory address)

The big advantages are:

it's in the standard library, so no need for an extra dependency
allows to link to C libraries using pure python code. This implies that one only needs to build libmadx once and nothing else. And since libmadx doesn't link to python, it can be used on any python version on windows, so installing becomes really easy (like any other package, no custom args).

The downsides are:

not possible to link statically, so it will be necessary to carry the libmadx.dll around manually.
slower than Cython, which basically allows writing C programs in python syntax.

For most affairs, the performance should not make any noticable a difference (one is using python for the main program anyway...). The only matter, I was really worried about is getting the C arrays into numpy arrays. If it was necessary to create temporary python lists from the C data, then create numpy arrays from these, it could really have some impact. Luckily, it turns out, that this concern was unjustified, as you can directly create numpy arrays using the memory address of the C arrays. That part is already implemented #82 to remove the link dependency on numpy (which is already a nice simplification for deployment).

pymad / cpymad

Simplify deployment for windows systems #81