Implement multidimensional numerical integration

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

The patch coming with this ticket is a multidimensional numerical integrator, which uses calls to the Cuba library (http://www.feynarts.de/cuba/), or in case of 1-dim integrals, it defaults to the GSL numerical_integral.

The attached code also has an option to try to reduce the dimensionality of the integral by first trying a (multithreaded) symbolic evaluation.

To make the integral evaluation fast, the code first makes an honest effort to compile the integrand (either using _fast_float_ or cython, depending on the input).

The code can output verbose information on all intermediate steps (including the resulting cython code to be compiled; results from the symbolic preprocessing, etc.), which tries to make the evaluation as transparent as possible. This should allow both for debugging, as well as for optimizing the input integrand in slow-to-converge cases.

For most cases I tested it, this implementation is faster than Mathematica's NIntegrate thanks to Cuba and the included symbolic preprocessor.

Note that the patch requires the Cuba library to be installed for Sage. So, I am attaching Cuba as a spkg package with this ticket as well. That spkg package needs to be installed first before building sage with the patch.

CC: @novoselt

Component: numerical

Keywords: numerical integration, cuba

Branch/Commit: u/rws/implement_multidimensional_numerical_integration @ a9b78d7

Issue created by migration from https://trac.sagemath.org/ticket/15492

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

Description changed:

--- 
+++ 
@@ -4,10 +4,10 @@

 To make the integral evaluation fast, the code first makes an honest effort to compile the integrand (either using `_fast_float_` or cython, depending on the input).

-The code can output verbose information on all intermediate steps (including the resulting cython code to be compiled; results from the symbolic preprocessing, etc.), which tries to make the evaluation as transparent as possible. This should allow both for debugging, as well as for optimizing the input integrand is slow-to-converge cases.
+The code can output verbose information on all intermediate steps (including the resulting cython code to be compiled; results from the symbolic preprocessing, etc.), which tries to make the evaluation as transparent as possible. This should allow both for debugging, as well as for optimizing the input integrand in slow-to-converge cases.

 For most cases I tested it, this implementation is faster than Mathematica's NIntegrate thanks to Cuba and the included symbolic preprocessor.

-Note that the patch requires the Cuba library to be installed for Sage. So, I am sending Cuba as a spkg package in a separate ticket. That spkg package needs to be installed first before building sage with this patch.
+Note that the patch requires the Cuba library to be installed for Sage. So, I am attaching Cuba as a spkg package with this ticket as well. That spkg package needs to be installed first before building sage with the patch.

kcrisman commented 10 years ago

comment:2

Wow, this is interesting - and under active development. Here are annoying things that have nothing to do with the correctness of your code that nonetheless should be brought up.

I can never remember whether the LGPL is compatible with GPL in the correct order we need it to be here...
I think that you should consult more (as per your sage-devel post) with where this should fit in with Sage. We already have enough directories, and ideally this would have at least some direct connection with already existing integration facility and/or the GSL stuff. That said, obviously this is not something one can plop in the calculus directory either.
There's also the matter of requiring a new spkg without the "waiting period as optional spkg". I'm not saying what should happen, just that this is typically the way this is handled.
The code is also very "case"y and full of lots and lots of options for the functions... which may be unavoidable. But it is hard to go through for that reason. Is there any way you can modularize this more? For instance, replaceDictionary (and we don't usually use camelCase) looks painful to me.

vbraun commented 10 years ago

comment:3

LGPL is fine.

If you want to touch cuba's build system at all, it would be nice to transition to libtool / automake. There shouldn't be any hand-crafted makefile.in, it should be autogenerated from Makefile.am. Its basically impossible to generate shared libaries by hand. I'm pretty sure your makefile.in won't work on OSX and Solaris. But for starters, you tould just use the cuba static library (static libraries suck, but we could leave enabling shared libraries for the future...)

The code should go into sage.libs.cuba, either a separate directory or just sage/libs/cuba.pyx. Don't add a copy of the cuba header to the Sage library, include the installed header (see make install).

The Sage library code is supposed to make at least an attempt to keep lines at <= 80 chars. Its acceptable to have longer lines, but only if it is necessary to improve readability. E.g. comments and docstrings very rarely have any excuse to be longer than that. In emacs reflowing a paragraph of text is Meta-q (Alt + 'q'), e.g.

Also, PEP8 spacing would be nice.

Line breaks in the doctest should use the new syntax ....::

    sage: foo(123,
    ....:     456)

The multidim_integration function is 1000+ lines long. It really needs to be refactored into subroutines of reasonable size.

Use SR.symbol() for temporary variables whose name you don't care about.

The whole functionality to compile a symbolic expression into C code is nice but unrelated to cuba. It should be moved elsewhere (probably somewhere in sage.ext). It is also sufficiently independed as a task that it could be a separate ticket, you could first implement cuba with fast_float and later add the C/cython callable as an optimization.

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

updated to install cuba.h

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

comment:4

Attachment: cuba-3.2p0.spkg.gz

Hello all,

Thank you so much for your prompt comments! I hope I address them at least so-so below. I updated the attached patch and spkg files accordingly.

As sort of an excuse, I wanted to say that actually this is my first python (and Sage) code ever, so bear with me ... Here it goes point by point.

Replying to kcrisman's comments:

I can never remember whether the LGPL is compatible with GPL in the correct order we need it to be here...

-- All I can say is that Wikipedia says it is, and a cursory look at the text confirms it ... for me.

I think that you should consult more (as per your sage-devel post) with where this should fit in with Sage. We already have enough directories, and ideally this would have at least some direct connection with already existing integration facility and/or the GSL stuff. That said, obviously this is not something one can plop in the calculus directory either. -- Moved code to sage.libs.cuba as per comment by vbraun above.
There's also the matter of requiring a new spkg without the "waiting period as optional spkg". I'm not saying what should happen, just that this is typically the way this is handled. -- I am not sure what this means. How do I make a new spkg "without a waiting period"?
The code is also very "case"y and full of lots and lots of options for the functions... which may be unavoidable. But it is hard to go through for that reason. Is there any way you can modularize this more? For instance, replaceDictionary (and we don't usually use camelCase) looks painful to me. -- Fixed camelCase, thanks! Modularity is addressed below. About the plethora of options -- it is, I think, unavoidable. Numerical integration is always a bit of an art. What I tried to do is make the code as hackable as possible using only function arguments. You can always just ignore the obscure options... As an example of how "case"y it could have been, take a look at the full Mathematica documentation of NIntegrate: http://reference.wolfram.com/mathematica/tutorial/NIntegrateIntroduction.html http://reference.wolfram.com/mathematica/tutorial/NIntegrateIntegrationStrategies.html http://reference.wolfram.com/mathematica/tutorial/NIntegrateIntegrationRules.html

Replying to vbraun's comments:

If you want to touch cuba's build system at all, it would be nice to transition to libtool / automake. There shouldn't be any hand-crafted makefile.in, it should be autogenerated from Makefile.am. Its basically impossible to generate shared libaries by hand. I'm pretty sure your makefile.in won't work on OSX and Solaris. But for starters, you tould just use the cuba static library (static libraries suck, but we could leave enabling shared libraries for the future...) -- I have practically zero experience with libtool/automake, and especially with making software portable. The included makefile.in is already testing my abilities. And unfortunately I failed to build the sage lib with the static libcuba.a. So, I'd definitely appreciate any help you can give me here. But note that the original code comes with a makefile.in and without a makefile.am. All I did was just cosmetic changes to the already supplied makefile.in: added a -fPIC and $(CC) -shared -o libfile.so file.o which I don't see how it can be harmful. So, if something more is needed for portability, this may be an upstream issue. But again, I am hardly an expert.
The code should go into sage.libs.cuba, either a separate directory or just sage/libs/cuba.pyx. Don't add a copy of the cuba header to the Sage library, include the installed header (see make install). -- Moved code to sage.libs.cuba. Removed copy to cuba.h.
The Sage library code is supposed to make at least an attempt to keep lines at <= 80 chars. Its acceptable to have longer lines, but only if it is necessary to improve readability. E.g. comments and docstrings very rarely have any excuse to be longer than that. In emacs reflowing a paragraph of text is Meta-q (Alt + 'q'), e.g. -- Thanks for the suggestion. Done ... more or less: The problem is that when I reflow the examples, keywords such as "rel tol" no longer have effect, so I left those untouched.
Also, PEP8 spacing would be nice. -- I had no idea about recommended spacings! Thanks for pointing those out to me. Tried to fix them, but I'm sure I missed a lot.
Line breaks in the doctest should use the new syntax ....:: -- Much better looking now, thanks!
The multidim_integration function is 1000+ lines long. It really needs to be refactored into subroutines of reasonable size. -- Modularity is addressed below.
Use SR.symbol() for temporary variables whose name you don't care about. -- I really don't care about those vars outside that code, so for me they are indeed temporary. But let me know what I should use instead, if you still think this is not "standard Sage". Thanks!
The whole functionality to compile a symbolic expression into C code is nice but unrelated to cuba. It should be moved elsewhere (probably somewhere in sage.ext). It is also sufficiently independed as a task that it could be a separate ticket, you could first implement cuba with fast_float and later add the C/cython callable as an optimization. -- Modularity is addressed below.

Now, about modularity. Yes, it would be nice if compiling into C can be made automatic for any task (for people who do not want to bother with cython), but I don't think I am up for that task. Note that the cython compilation part of the patch specifically invokes multidim_integration_unit_cube. So as written, it is in fact very Cuba specific. In principle, that call can be replaced to a call to any other functional (in a similar way I did the function substitution using a dummy global function), but it already starts getting too contrived. Maybe at some point I'd think about extending that part of the code into a separate utility, but it will not happen now. However, I agree that that piece of the code can be moved to a separate function within the cuba lib. So, I did that, but again, only within the cuba lib. Also, note that of the original 1000+ lines of multidim_integration, 600 are in fact documentation; and the majority of the remaining 400 lines are simply sanity checks and printing debugging information, which should be pretty self-explanatory ... I'd hope... But ok, I still split the code :)
Nathann Cohen suggested that I should add .rst documentation. I tried adding that in doc/en/reference, but hg says those changes are ignored. Suggestions?

Cheers!

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

comment:6

I found ways to fix the two problems I had before:

After line-folding an example (using ....: ) which need keywords such as 'rel tol', those keywords must be sitting on the first line. Otherwise they have no effect. So, I folded (I hope) all lines neatly that end up in the documentation.
About the rst documentation problem -- I was modifying the directory structure of doc/en/reference/calculus/ instead of just the file index.rst. Now, adding documentation works for me. So, I added it.

Also I expanded the text of the documentation to make clearer why those examples matter. And I moved all imports to top level, so that function calls are not slowed down by those. For some reason %time registers that change (in Wall time), while timeit() does not...

I attach the updated patch.

vbraun commented 10 years ago

comment:7

As an interface that is presentable to the end-user I would propose

sage: integral.symbolic(...)
sage: integral.numerical(...)
sage: integral.multidim(...)
sage: integral.multidim_numerical(...)
sage: integral(...)   # auto-guess

The cuba library interface shouldn't do any symbolic integration. That is not only unrelated, most numerical options also don't make sense for symbolic integration.

fast_callable should be used instead of the older fast_float.
The C source code generation should be factored out and doctested separately.
Dynamically loaded modules should be wrapped in some object (maybe CythonCallable) and unloaded when finished (i.e. destroyed). Otherwise its leaking memory...
library code shoudn't use sage_eval.
Catch-all except: statements are not allowed -- could be KeyboardInterrupt, for example.
Docstrings in Python should be imperative: "Return foo" instead of "Returns foo".

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

comment:8

Let me for now address only your first comment, since the rest are secondary issues.

I completely agree with you that such a split of the interface must eventually happen. However, at the practical level, I do not think it is within the range of my current sage abilities or that I have enough motivation to do it. Here is why:

I've had way too many issues with integrate() to even start thinking about a multidimensional symbolic integrator. Here is a laundry list of problems I've faced in case anybody cares to take up the problem:

If there are parameters in the integral then there is no easy way to pass answers to maxima, when it starts asking "Is positive, negative or zero?", when is some mess (which usually implies that using assume() to fix it won't work), which for multidimensional integrals is what you usually get in my experience. If there are no parameters, then all you care about is getting some number -- so, there is not much point in the analytical answer itself (in my opinion of course).
In many cases I tried (after transforming to the unit cube), maxima gives back an answer only once I avoid the boundaries of the interval and use the principal value of the integral. But to avoid them I need to introduce an epsilon,i.e. a parameter -- which brings me back to point 1 above.
sympy.integrate is actually in many cases much nicer (in my opinion), giving back piecewise answers, having a somewhat better handle on special functions such as Ci(x), etc. But sage doesn't support the piecewise answers; and SR(sympy.Ci(x)) gives an error, even though sage has Ci(x). E.g. try this: integrate(cos(x)/x,x,algorithm='sympy')
If the analytical integral is not doable, then the question remains what should the symbolic result be. Just the first function multidim_analytical_integration_unit_cube returns? Or all of its results as a list? And handling anything but constant intervals (as over the unit cube in the current implementation) becomes somewhat of a mess, unless you transform your intervals to the unit cube. But in that latter case, simple integrands with simple (but non-numeric) integration intervals very often become horrible messes when transformed to the unit cube, which even Mathematica can't handle.

So, really, the current multidim_analytical_integration_unit_cube is the best I can hope for for now. And as a comparison, Mathematica also has such a SymbolicProcessing option for its NIntegrate, which (at least at the user level) has nothing to do with doing multidimensional integrals with Integrate.

So, maybe a compromise would be simply to hide multidim_analytical_integration_unit_cube with an underscore and not advertise it at all to the end-user? Let me know what you think.

vbraun commented 10 years ago

comment:9

I don't have a good answer to symbolic integration, as you say symbolic multi-dimensional integration is usually a mess unless there is some symmetry to exploit.

As for 4), you can have un-evaluated symbolic expressions:

sage: x.add(x, hold=True)
x + x
sage: sage.symbolic.integration.integral.indefinite_integral(x, x, hold=True)
integrate(x, x)

So my suggestion is (and thats what I meant to say initially) to just focus on the numerical integration and interfacing with cuba here. Anything else can go to a separate ticket. We don't even have to think too much about a user-interface yet, which is what you have done anyways in the current version where you have to explicitly import from sage.libs.cuba. The integral() command can then later be enhanced to make it easy to use.

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

comment:10

So, I guess the issue is immediate usability vs. code cleanliness. I myself clearly vote for the former -- I'd still like to include the symbolic preprocessing option with this patch. The option is really useful I think. Without it many integrals that I happen to stumble upon in my work are completely undoable numerically (which was exactly the reason I introduced it in the first place). That option is not used by default, so if anybody uses the integrator, the symbolic part would never kick in unless they actually enable it on purpose, which would mean they were desperate (as I was ...).

I do realize the symbolic option is unrelated to cuba, and I have no problem if at some point that part of the code is moved to a separate lib, once an actual symbolic multidim integrator is put in place. It is well separated in the code, so transitioning to a separate lib should not be an issue in terms of code maintainance. So, even if you favor code cleanliness, I don't think that the presence of that option would break long-term development/support.

Now, of course I'm just a first time contributor, and only a month-long user of sage, which will soon be a decade old. So, I'll understand if you still decide to kill the option.

vbraun commented 10 years ago

comment:11

Ok, it wasn't clear to me that the symbolic preprocessing is that important. It is still unrelated to cuba, so how about moving it to sage.libs.cuba.symbolic_preprocessing.py or something like that? Otherwise it makes it just harder to debug and read the cuba.pyx file. Similar for C/Cython source code generation.

It also would be nice to split up multidim_integration some more. For example, the entire if (symbolic) block would be much better as a separate function: Makes multidim_integration shorter and easier to read if you don't have to juggle with temporary variables funcS, newvars. E.g.

   if symbolic:
       try:
           return multidim_integration_symbolic(...)
       except SymbolicTimeout:
           pass

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

comment:12

Thank you for all the useful comments, Volker!

Below are my modifications in response to your comments.

Best, Svetlin

fast_callable should be used instead of the older fast_float.

-- Done.

The C source code generation should be factored out and doctested separately.

-- moved to new file: cython_compilation.pyx and tested.

Dynamically loaded modules should be wrapped in some object (maybe CythonCallable) and unloaded when finished (i.e. destroyed). Otherwise its leaking memory...

-- There are no dynamically loaded modules in the code (at least as far as I understand the definition). I moved all of them to top level so that loading modules does not slow down the code.

library code shoudn't use sage_eval.

-- switched to eval

Catch-all except: statements are not allowed -- could be KeyboardInterrupt, for example.

-- fixed. Removed most try statements, which I realized I only needed in the code development.

Docstrings in Python should be imperative: "Return foo" instead of "Returns foo".

-- fixed.

Ok, it wasn't clear to me that the symbolic preprocessing is that important. It is still unrelated to cuba, so how about moving it to sage.libs.cuba.symbolic_preprocessing.py or something like that? Otherwise it makes it just harder to debug and read the cuba.pyx file.

-- Thanks! Moved to new file: symbolic_preprocessing.pyx. Added underscore in front of _multidim_analytical_integration_unit_cube.

Similar for C/Cython source code generation.

-- Moved to new file: cython_compilation.pyx

It also would be nice to split up multidim_integration some more. For example, the entire if (symbolic) block would be much better as a separate function: Makes multidim_integration shorter and easier to read if you don't have to juggle with temporary variables funcS, newvars. E.g.
```
if symbolic:
    try:
        return multidim_integration_symbolic(...)
    except SymbolicTimeout:
        pass
```

-- Cleaned that part of the code a lot. Temp vars were actually left over cruft, which I removed. Now that piece of the code just checks whether integrand is constant and returns; or not, and continues.

Also, made additional minor tweaks and fixed a bug I had introduced at some point for Python function integrands, and for which I was missing an important test.

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

updating patch according to Volker Braun's comments

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

comment:13

Attachment: trac_15492_multidim_numerical_integrator.patch.gz

-- Updating patch against sage 5.13.

-- Removing left over unnecessary imports.

06bbf81d-f646-4d20-904a-9cca4deefa37 commented 10 years ago

Attachment: trac_15492_multidim_numerical_integrator.2.patch.gz

patch against sage 5.13

rwst commented 10 years ago

Branch: u/rws/implement_multidimensional_numerical_integration

rwst commented 10 years ago

comment:17

I have made a branch so that people can work better with your code. It however depends on the existence of the cuba package, which isn't even optional at the moment. So this needs work in that both the package needs to be included in Sage (please open a separate ticket) and this code should fail gracefully if the package is not install. This is all detailed in the developer manual.

New commits:

a9b78d7 Trac 15492: add multidim numerical integration

rwst commented 10 years ago

Commit: a9b78d7

c22b6800-ec0b-4cbf-94c4-0a74aecc2093 commented 7 years ago

comment:19

+1 for having Monte-Carlo integration in Sage.

I recently needed this functionality, and resorted to using the Cubature package; Python bindings exist, and I found the interface quite nice. I guess the difficult part would be to rewrite cubature.py for Sage..

stassev commented 1 year ago

After this many years, I updated the code. I refactored it to drop the CUBA dependency and instead to use the internal SageMath integrator.

The code can do symbolic and numerical integration on non-trivial domains.

I posted the code here:

https://github.com/stassev/sagemath-multidimensional-integration

If people are still interested, feel free to incorporate it into SageMath. I can try to help as time permits.

sagemath / sage

Implement multidimensional numerical integration #15492