Parameter Optimisation Design : Architectural Design of the Optimisation

stfc-lam commented 4 years ago

Architectural Design of the Optimisation. Greg Tucker to lead Working Group with Rebecca, Duc, Toby, Alex and Nick to discuss design document.

g5t commented 4 years ago

On 2020-01-27 from 11:00 to 12:30 @rebeccafair @nickbattam-tessella and @g5t met in CR01, with @tgperring and @abuts joining remotely, to discuss a draft Model Optimisation design document.

New tasks from this meeting: #64 #65 plus suggestions on how to improve the design document.

g5t commented 4 years ago

The design document is now located in the pace-developers repository. Review by @rebeccafair @nickbattam-tessella @tgperring @abuts and/or @mducle would be welcome at this time.

mducle commented 4 years ago

Thanks @g5t !

@rebeccafair @nickbattam-tessella @tgperring @abuts

I've made a pull request pace-neutrons/pace-developers#2 to fix some stuff...

There are couple of points I was unclear on though:

In the text for OptFunction:

It must also contain methods for querying and updating all properties, though it might be prudent to not implement updating the user defined function, and a method to evaluate the function when given appropriate input, (a,\ldots)

(emphasis mine). Are you saying here that we should make so that once constructed with a function [handle/reference/string] the OptFunction object must not be modified to change this? (e.g. to fit a different function the user needs to construct a different OptFunction object?). And are you saying that there should be a method to evaluate the fit function at some value of the fit parameters? (The phrasing is a bit ambiguous to me, it suggests to me that neither of these methods should be allowed).

What do you mean by the parameter_functions field of OptFunction - is this to specify the bindings?
What are the allowed input_types in OptFunction? Is this to specify the input arguments for the model functions? I guess this is necessary for Matlab (and maybe C++) but the function signature is easily obtained in Python from the function reference using the inspect module (and in principle this information could also be obtained for Matlab: http://undocumentedmatlab.com/articles/function-definition-meta-info but I can't find anywhere where anyone has actually done it; in C++ I guess it might be possible with templates...). My point is: can we make this optional for Python?
In the OptModel class you have a enum backend property which implies that the back-end(s) are strictly defined by us - the users would not be able to use their own optimizer. Should we make this more general? (Or would making it more general impose too high a complexity/cost on the implementaton?)
Bindings - one of the things I find quite hard to use about the current multifit implementation is the parameter binding mechanism which relies on parameter indices. One thing which I quite like in the Mantid fitting engine is that there all parameters are named so you can specify the bindings as a string, such as Fit(..., Ties='Sigma=0.5') or Fit(..., Ties='f0.Sigma=f1.Sigma'). Should we modify the design of OptFunction to allow named parameters (and named functions - something which Mantid does not have; all functions there are just indexed hence the f0, f1, etc.)? Then the design of OptModel can also be modified to allow named parameter bindings rather than indexing? These would of course all get converted into the binding functions which would what is actually used internally.
In the section Function applicability, the applies cell array:
1. Is this a property of OptModel? (it's not in the UML diagram).
2. I guess there should be one applies cell array for each of the multiplier, foreground and background sets (e.g. there should be three applies array in each OptModel?).
3. The number of elements of each applies array should be the same as the number of elements in the corresponding functions sets (which is an array of OptFunction objects), and each element of each applies cell array should be a scalar, or vector with the same number of elements as or fewer than than the number of datasets to be fitted?
4. What is the default behaviour (that the functions should apply to all datasets independently? [e.g. the arrays should be filled with -1] or should a single parameter be fitted to all datasets [applies is all 0]? Or would the default depend on whether the applies array corresponds to a multiplier, foreground or background functions set? E.g. for foregrounds the functions would be independent [applies all -1] but for multipliers and backgrounds it would be dependent [applies all 0])?
5. Like with the bindings, I think this syntax could be quite confusing to users, if they want to go beyond the simple default behaviours...

For the Open Questions section, I hope(!) to answer them in the design document for the third party interface... But in the mean time I just wanted to note that one restriction of the compiled Matlab libraries we generate is that they cannot evaluate an arbitrary m-file. They can use eval and feval with strings or anonymous functions but calling an m-file which was not in the directory structure of the library when it was compiled is forbidden. I need to test if the compiled Matlab library can call Python directly but even it cannot I think there are workarounds we can make so that we can use Python functions for the bindings etc. However, this means that if we implement this optimisation engine in Matlab with a Python interface through a compiled Matlab library, that user defined model functions must be defined in Python (or C++) - they cannot use a Matlab m-file (they can use an anonymous function I guess, but these are quite limited - no loops or conditionals).

pace-neutrons / Pace-Project-Plan

Parameter Optimisation Design : Architectural Design of the Optimisation #1