marziarivi commented 8 years ago

I would like to add a function in the biro solver for the computation of the gradient of the chi squared with respect to Sersic shape parameters. I need it in order to use HMC and I think the best solution is to implement this computation within MontBlanc in version 4.0. If it is not a problem I will create a branch for it.

Specification

Outline

Since we know the analytical formula of the model visibilities we can compute analytically the gradient with respect to any galaxy parameter.
Currently it is computed with respect to the Sersic shape parameters but the procedure could be easily extended to any other source model parameter, once the formula of the partial derivative with respect to that parameters is known.
Implementation for v4

If the Sersic gradient computation option is enabled (see below), the rime solver will add in the pipeline the computation of the chi squared gradient with respect to the Sersic parameters (3 for each Sersic source) that is returned in the new array: X2_grad(3,'nssrc').
The CPU version is also implemented: 3 functions are added in the CPU solver:

compute_sersic_derivatives()
compute_ekb_vis_grad()
compute_chi_sqrd_gradient()
Implementation for v5 (multi-GPU)

Following the what has been done for the chi squared computation, the gradient computation has been added to the CompositeRimeSolver.

GPU kernel

The GPU implementation is contained in SersicChiSquaredGradient.py. It assume the visibilities already computed (from the previous function in the pipeline). The parallelisation is the same as for the RimeSumCoherencies, but only Sersic sources are considered. The gradient array in the device memory is initialised with zero values. For each source (sequentially), the gradient contributions within the cuda block are computed in parallel by each thread for its corresponding uv point and then accumulated within the block by using the cub::block_reduce function and stored in the share memory for each source parameter. Then only the first 3 threads add in parallel this partial result to the array in the device memory using atomic operation. This way race conditions are avoided as well as the number of accesses to the device memory. This choice has been made in order to reduce the amount of memory required as it doesn't need to store the gradient of each visibility. This kernel is called after the chi_square computation kernel, so that the model visibilities are already computed and available on the GPU device memory.

A similar separate file can be easily implemented for the computation of the chi squared gradient with respect the parameters of Gaussian sources.

Options to Enable/Disable

Add new option in the slvr_config.py:

SERSIC_GRADIENT = 'sersic_gradient'
DEFAULT_SERSIC_GRADIENT = False
VALID_SERSIC_GRADIENT = [True, False]
SERSIC_GRADIENT_DESCRIPTION = (
    "If True, chi_squared gradient is computed with respect to the sersic parameters for each source.")

Example

MS_single_gpu_gradient_example.py and MS_single_node_gradient_example.py are added to the example directory. Number of Sersic sources N (by the option -ns) and MS file name must be provided.

In these examples simulated observations of N Sersic sources are computed using the GPU solver, then the analytical chi squared gradient and the numerical one (i.e. the chi squared incremental ratios with respect to Sersic parameters) are computed to be compared.

jtlz2 commented 8 years ago

@marziarivi Sounds good to me in principle. What are the maths of it though? Should it be generalizable to the gradient of any of the parameters (source and/or telescope)?

@sjperkins What are the implications for array sizes etc.?

sjperkins commented 8 years ago

@marziarivi I'm happy to include other modes of operation into the solvers with the proviso that this is well specified. Going forward, I'd like this specification to be written up in the initial comment of this issue (but keep your initial comments at the bottom of it?). This will help drive the discussions on the implementation. As we discuss and refine the implementation in the comments, the initial specification should be updated. Would you please be responsible for this?

First thing I'd like to see in the specification is an HTML link to the technique.

My initial questions/comments:

Are you intending to modify kernel behaviour via a flag or will you write new kernels?
I guess this is a gradient over time? If so, are you going to difference values between timesteps?
I think its reasonable to aim for gradient over source parameters as a first stab at the problem (@jtlz2).
- For completeness, I think we should include gradient over Gaussian parameters.
- Gradient over DDE's and DIE's might be challenging from a data size POV, so could be left out initially. But please disagree with me on this point if I'm not getting the gist of it (if the technique simply involves differencing over time then this might be easy).

sjperkins commented 8 years ago

@marziarivi It would be wise to wait for my merge of the v5-edits branch into master, since this includes many architectural, but not functional, changes to the v4 code. I hope to do this soon, but I could merge it sooner so that you can branch off from those changes.

o-smirnov commented 8 years ago

I fully support this idea. My plan for next year was to add gradient computation to Montblanc -- this will enable all sorts of calibration applications.

o-smirnov commented 8 years ago

By which I mean, optional gradients with respect to all of the RIME parameters (source parameters, pointing errors, G terms, etc.)

marziarivi commented 8 years ago

This is the idea: since we know the analytical formula of the model visibilities I can compute analytically the gradient with respect to any parameter (not the time!). So what I do is just to add new functions in the solver (without modify any kernel) that implement the formulas derived analytically, which depends on the visibilities and the model parameters. So as an intermediate step I will compute the partial derivative of the visibilities (array of dimension num.vis x num.parameters) which will then be reduced on the number of visibilities to produce a 1 dimensional array of size num.parameters. As a starting point a will consider only the source positions, flux and sersic parameters but then I will also consider beam parameters and pointing errors. I guess the procedure could be easily extended to any other model parameter, once the formula of the partial derivative with respect to that parameter is known. I will try to be as general as possibile and then define a specific function for each partial derivative so that if we want to add or replace (Sersic with Gaussian sources) some of them it is just a matter of writing a new function. @sjperkins Not sure what you mean with "this specification to be written up in the initial comment of this issue " and "the specification is an HTML link to the technique". Can you show me an example? There is not a specific technique. It is just a matter of implementing analytical formulas. I can just write in the specifications such formulas.

marziarivi commented 8 years ago

Sorry I closed the issue by mistake...

marziarivi commented 8 years ago

@sjperkins Let me know when I can create my branch.

jtlz2 commented 8 years ago

@marziarivi Are the partial derivatives always analytic? In what cases are they not?

marziarivi commented 8 years ago

They are analytic because we have the analytic expression of the RIME.

sjperkins commented 8 years ago

@marziarivi OK, I've updated your first comment with the specification outline you provided above. You can edit it by clicking on the pencil button. It might be useful to learn some markdown for readability.

Reading between the lines of your specification, I think it would be best to compute this stuff on the GPU. Why not write a kernel for this stuff, it's probably highly parallel?

marziarivi commented 8 years ago

Yes I completely agree.

sjperkins commented 8 years ago

@marziarivi Updated the initial comment a bit more. I think other things to think about are

any new arrays and their dimensions and dimension ordering.

jtlz2 commented 8 years ago

@marziarivi Excuse ignorance - does that extend to the noise/covariance matrix/weight vector?

marziarivi commented 8 years ago

I think yes.

o-smirnov commented 8 years ago

Note that derivatives w.r.t. pointing errors are not analytic, if you're using an image for the E term.

marziarivi commented 8 years ago

@sjperkins In the chi_sqrd computation do you reduce over the polarisation components of the visibilities?

sjperkins commented 8 years ago

Yes, take a look at the last part of the kernel.

Sent from phone On 30 Sep 2015 19:05, "Marzia Rivi" notifications@github.com wrote:

@sjperkins https://github.com/sjperkins In the chi_sqrd computation do you reduce over the polarisation components of the visibilities?

— Reply to this email directly or view it on GitHub https://github.com/ska-sa/montblanc/issues/85#issuecomment-144478314.

marziarivi commented 8 years ago

Hi, I am just taking this work back (I was busy on another project...). Shall I implement this on the latest version (i.e. v5)?

sjperkins commented 8 years ago

Hi Marzia

You can work on v4. V5 is built on top of it. On 27 Nov 2015 17:56, "Marzia Rivi" notifications@github.com wrote:

Hi, I am just taking this work back (I was busy on another project...). Shall I implement this on the latest version (i.e. v5)?

— Reply to this email directly or view it on GitHub https://github.com/ska-sa/montblanc/issues/85#issuecomment-160165796.

marziarivi commented 8 years ago

Hi,

my code seems not working well as before. Actually the K80, I used, have been replaced by Xeon Phi so now it runs on K20c. Anyway, since each node has 2 GPUs it seems the code is using both GPUs, is it true?

INFO:montblanc:Device #0: Tesla K20c INFO:montblanc:Device #1: Tesla K20c

and when I try to print visibilities on the GPU:

print slvr.bayes_data_gpu.get()

I get

File "/scratch/hpcrivi1/MY_MONTBLANC/lib/python2.7/site-packages/pycuda-2015.1.3-py2.7-linux-x86_64.egg/pycuda/gpuarra y.py", line 264, in get _memcpy_discontig(ary, self, async=async, stream=stream) File "/scratch/hpcrivi1/MY_MONTBLANC/lib/python2.7/site-packages/pycuda-2015.1.3-py2.7-linux-x86_64.egg/pycuda/gpuarra y.py", line 1160, in _memcpy_discontig drv.memcpy_dtoh(dst, src.gpudata) pycuda._driver.LogicError: cuMemcpyDtoH failed: invalid device context

Is something changed? Should I specify the device? Or is a problem of the system?

Finally, it is quite hard to me to find a logic path to read your python code (I am used with C/C++ where there is a main function and you can start from there to follow the execution of the code). Now I really don’t know where to start. Could you give me some hints on where you define/create the visibilities array (I should create an analogous array for the partial derivatives with one more dimension for the parameters) and where the computation of the visibilities starts both on the CPU and on the GPU?

Many thanks, Marzia

On 27 Nov 2015, at 17:34, Simon Perkins notifications@github.com wrote:

Hi Marzia

You can work on v4. V5 is built on top of it. On 27 Nov 2015 17:56, "Marzia Rivi" notifications@github.com wrote:

Hi, I am just taking this work back (I was busy on another project...). Shall I implement this on the latest version (i.e. v5)?

— Reply to this email directly or view it on GitHub https://github.com/ska-sa/montblanc/issues/85#issuecomment-160165796.

— Reply to this email directly or view it on GitHub.

sjperkins commented 8 years ago

Hi Marzia

Are you using
```
with slvr.context:
    print slvr.bayes_data_gpu.get()
```
to access the GPU data? This should work. Ideally it would be good to add a retrieve_bayes_data method for pulling individual array data off the GPU, but I'm very time-limited at present.
Well, montblanc is implemented as a python package not an executable. The entry point to this package is the rime_solver function in montblanc/__init__.py, which returns a Solver used to compute the RIME, hence the following use:
```
with montblanc.rime_solver(slvr_cfg) as slvr:
   slvr.solve()
```
this function in turn calls the rime_solver function in factory.py to return a solver of the appropriate version.
Take a look at montblanc/impl/biro/v4/BiroSolver.py file, specifically the array and property definitions.

These dictionaries define arrays in terms of their shape, type, default and test case values. In the case of the shape of the array, strings associated with RIME dimensions can be provided, e.g. ntime, nbl, nchan etc. and these will be substituted with actual integer values when they are created.

Similarly, with the type, ft can be specified to mean the floating point type associated with the solver (np.float32/np.float64) or 'ct' to mean the complex type (np.complex64/np.complex128).

Adding cpu=True and gpu=True keywords to the dictionaries will specify whether NumPy and GPUArray's are created on the solver. e.g. bayes_data_cpu and bayes_data_gpu.

Are you intending to modify v4 to add this functionality, or will you create your own solver?

marziarivi commented 8 years ago

On 3 Dec 2015, at 07:58, Simon Perkins notifications@github.com wrote:

Hi Marzia

Are you using

with slvr.context: print slvr.bayes_data_gpu.get() to access the GPU data? This should work. Ideally it would be good to add a retrieve_bayes_data method for pulling individual array data off the GPU, but I'm very time-limited at present.

Yes, you are right! I forgot the first line. Now it works but I don’t understand why in the visibilities i get the V component equal to I when I set the V component of the brightness matrix equal to zero…. The point is that I work only with the component I.

Well, montblanc is implemented as a python package not an executable. The entry point to this package is the rime_solver function in montblanc/init.py, which returns a Solver used to compute the RIME, hence the following use:

with montblanc.rime_solver(slvr_cfg) as slvr: slvr.solve() this function in turn calls the rime_solver function in factory.py to return a solver of the appropriate version.

Take a look at montblanc/impl/biro/v4/BiroSolver.py file, specifically the array and property definitions.

These dictionaries define arrays in terms of their shape, type, default and test case values. In the case of the shape of the array, strings associated with RIME dimensions can be provided, e.g. ntime, nbl, nchan etc. and these will be substituted with actual integer values when they are created.

Similarly, with the type, ft can be specified to mean the floating point type associated with the solver (np.float32/np.float64) or 'ct' to mean the complex type (np.complex64/np.complex128).

Adding cpu=True and gpu=True keywords to the dictionaries will specify whether NumPy and GPUArray's are created on the solver. e.g. bayes_data_cpu and bayes_data_gpu.

Are you intending to modify v4 to add this functionality, or will you create your own solver?

No, I am not going to create a new solver, I just want to add a new functionality which computes the partial derivatives of the visibilities if requested.

— Reply to this email directly or view it on GitHub.

sjperkins commented 8 years ago

Yes, you are right! I forgot the first line. Now it works but I don’t understand why in the visibilities i get the V component equal to I when I set the V component of the brightness matrix equal to zero…. The point is that I work only with the component I.

I think this is working as expected. Stokes parameters are I, Q, U, V, but they are combined into a brightness matrix as

I + Q U + iV U - iV I - Q

so if only I is non-zero, you should end up with the following

I 0 0 I

No, I am not going to create a new solver, I just want to add a new functionality which computes the partial derivatives of the visibilities if requested.

OK, I suggest you do it incrementally in a branch and push the changes as you make them. We can discuss then discuss strategies for integrating your changes, and discuss your code as it evolves?

marziarivi commented 8 years ago

Ok! At the moment my plan is the following:

add the following arrays in BiroSolver: vis_grad (ntime,nil,nchan,4,npar), chi_squared_grad (npar), where npar is the number of parameters with respect to compute partial derivatives.
add in RimeSumCoherencies.py the functions/kernels for the partial derivative computation, if requested (in some way by user) call them from the execute function.

Not sure where to define the number of parameters and where/how request the gradient computation (maybe just adding to ary_dict the option ‘gradient’ to be set true or false for the vis and chi_squared_result arrays, default: False, and add gradient keyword in the SolverConfiguration?)

We can have a chat on Skype if you like…

But first I need to make our MCMC code working. As I said it seems not working well as before. I did a very simple check: compare the arrays bayes_data (input visibilities transferred by the host) and vis_gpu (model visibilities, computed in the same way) as follows:

with slvr.context: if np.equal(slvr.bayes_data_gpu.get(),slvr.vis_gpu.get()).all: print 'ok!'

and actually they turn to be equal, but when I ask

print slvr.X2

I don't get 0 but on the contrary a get a bad value: 838.795179166!

Am I doing something wrong? Or is a bug?

On 4 Dec 2015, at 08:34, Simon Perkins notifications@github.com wrote:

Yes, you are right! I forgot the first line. Now it works but I don’t understand why in the visibilities i get the V component equal to I when I set the V component of the brightness matrix equal to zero…. The point is that I work only with the component I.

I think this is working as expected. Stokes parameters are I, Q, U, V, but they are combined into a brightness matrix as

I + Q U + iV U - iV I - Q

so if only I is non-zero, you should end up with the following

I 0 0 I

No, I am not going to create a new solver, I just want to add a new functionality which computes the partial derivatives of the visibilities if requested.

OK, I suggest you do it incrementally in a branch and push the changes as you make them. We can discuss then discuss strategies for integrating your changes, and discuss your code as it evolves? — Reply to this email directly or view it on GitHub.

marziarivi commented 8 years ago

Hi Simon,

here is an update of the problem.

For some reason the numpy command equal().all is not working as expected (or it just uses the real part of the complex number). Using array_equal() instead I get the right answer, i.e. the ‘bayes_data' and ‘vis' arrays are different. I printed a chunk of them and it turns that one is the complex conjugate of the other.

But they should be the same because they are computed from the same source parameters… I tried then to use the CPU solver to generate the visibilities. It actually works well and I get a chi_square = 0.

So there is a bug in the GPU-solver of Montblanc in the computation of the visibilities.

It is quite obvious that it is connected with the problem about (l,m) source coordinates I already mentioned some time ago. In fact if I use l,m as free parameters of the source then MCMC fits them with opposite sign with respect to the true values, whereas if I keep them fixed there is no fit at all with the other parameters and a lot of values are rejected. If I set them equal to zero (so visibilities have no imaginary part) then I can fit the other source parameters.

So the problem should be either in the ‘lm' array transfer or in the computation or multiplication of the K matrix.

Shall I open an issue on github about it?

Cheers, Marzia

On 4 Dec 2015, at 17:00, Marzia Rivi marzia.rivi@gmail.com wrote:

Ok! At the moment my plan is the following:

add the following arrays in BiroSolver: vis_grad (ntime,nil,nchan,4,npar), chi_squared_grad (npar), where npar is the number of parameters with respect to compute partial derivatives.

add in RimeSumCoherencies.py the functions/kernels for the partial derivative computation, if requested (in some way by user) call them from the execute function.

Not sure where to define the number of parameters and where/how request the gradient computation (maybe just adding to ary_dict the option ‘gradient’ to be set true or false for the vis and chi_squared_result arrays, default: False, and add gradient keyword in the SolverConfiguration?)

We can have a chat on Skype if you like…

But first I need to make our MCMC code working. As I said it seems not working well as before. I did a very simple check: compare the arrays bayes_data (input visibilities transferred by the host) and vis_gpu (model visibilities, computed in the same way) as follows:

with slvr.context: if np.equal(slvr.bayes_data_gpu.get(),slvr.vis_gpu.get()).all: print 'ok!'

and actually they turn to be equal, but when I ask

print slvr.X2

I don't get 0 but on the contrary a get a bad value: 838.795179166!

Am I doing something wrong? Or is a bug?

On 4 Dec 2015, at 08:34, Simon Perkins notifications@github.com wrote:

Yes, you are right! I forgot the first line. Now it works but I don’t understand why in the visibilities i get the V component equal to I when I set the V component of the brightness matrix equal to zero…. The point is that I work only with the component I.

I think this is working as expected. Stokes parameters are I, Q, U, V, but they are combined into a brightness matrix as

I + Q U + iV U - iV I - Q

so if only I is non-zero, you should end up with the following

I 0 0 I

No, I am not going to create a new solver, I just want to add a new functionality which computes the partial derivatives of the visibilities if requested.

OK, I suggest you do it incrementally in a branch and push the changes as you make them. We can discuss then discuss strategies for integrating your changes, and discuss your code as it evolves? — Reply to this email directly or view it on GitHub.

sjperkins commented 8 years ago

84 tracks the complex conjugate issue. I'll see if I have time to take a look at it, but the distributed version is very high priority for me at the moment.

marziarivi commented 8 years ago

I have implemented in Montblanc the computation of the X2 gradient for sersic parameters (only gpu version) in the following way:

modified slvr_config.py and BaseSolver.py files by adding a new parameter: nparams = X2 gradient size, default is 0
modified BiroSolver.py by adding two new output arrays:

ary_dict('chi_sqrd_result_grad’,(’nparams’,'ntime','nbl','nchan'),'ft', cpu=False), ary_dict('X2_grad',('nparams', ), 'ft', cpu=False, gpu=False),

and

def get_pipeline(slvr_cfg): wv = slvr_cfg.get(Options.WEIGHT_VECTOR, False) npar = slvr_cfg.get(Options.NPARAMS, 0) if (npar == 0): return Pipeline([RimeBSqrt(), RimeEBeam(), RimeEKBSqrt(), RimeSumCoherencies(weight_vector=wv)]) else: return Pipeline([RimeBSqrt(), RimeEBeam(), RimeEKBSqrt(), RimeSumCoherenciesWithGradient()])
I have created a new file called RimeSumCoherenciesWithGradient.py similar to RimeSumCoherencies.py, where I added the computation the gradient.

This way there are very few changes from the original version. I tested it and it seems still working as before both for nparams=0 and nparams>0.

marziarivi commented 8 years ago

I have completed the implementation of the chi squared gradient implementation for Sersic sources, both with the GPU and CPU solver. An example comparing the result with the numerical computation is provided. It can be easily extended to Gaussian sources. See comment at the beginning of the issue for all information.

sjperkins commented 8 years ago

Implemented in #147

ratt-ru / montblanc

chi_sqrd gradient #85

Specification

Outline

Implementation for v4

Implementation for v5 (multi-GPU)

GPU kernel

Options to Enable/Disable

Example

84 tracks the complex conjugate issue. I'll see if I have time to take a look at it, but the distributed version is very high priority for me at the moment.