Sparsity patterns for Jacobian and Hessian computations

a-jp commented 7 months ago

Based on clad discussion #780. We assume the current computations of a Jacobian and Hessian are returned as a dense matrix. We wish to augment the API to allow sparse representations of the Jacobian and Hessian to be computed. This places two requirements:

The calculation should return to the user two pieces of information a) in an as yet to be specified sparse data structure, the sparsity structure (the indices of the entries that are non-zero) and b) the values of the non-zero entries (see 2) in a compressed format.
The non-zero entries computed and returned should be structurally non-zero as a function of the differentiation, and not zero due to initial values or constants. Said another way, the zero entries that are excluded must be defined to be always zero, and not just zero due to specific initial values.

As a check on the results outputted from 1 and 2, one should be able to reconstruct a dense matrix from the two pieces of information provided by 1. with additional zeros where needed. For a sanity check, one would then expect the recombination of the sparsity structure, and the compressed non-zero entries into a dense matrix to be identical (within machine precision) to the currently computed dense matrices computed by calls to the Jacobian and Hessian routines.

It's for library developers to comment on the best mechanics for integration into clad, a small amount of the discussion is contained in the initial discussion #780. It's possible that something is to be learned from other works such as CasADi although I neither use or have any affiliation with that codebase. Likewise, sparse Hessians and Jacobians are well accommodated for, as noted in the original discussion, in CppAD.

bradbell commented 4 months ago

You may want to look at the following: https://arxiv.org/abs/2111.05207

a-jp commented 3 months ago

Hi, I was just checking in to see if you still felt v1.6 was a probable version for this capability to be released?

Many thanks Andy

vgvassilev commented 3 months ago

@vaithak probably can comment on his plans about this.

vaithak commented 3 months ago

Hi @a-jp, I am currently adding support for computing only the diagonal entries of a hessian, which is our first step in computing sparse or partial entries of a hessian matrix. I glanced through the paper that @bradbell mentioned. I think an end-to-end implementation will require a lot of effort. I will try to figure out if we can break the complete implementation into a sequence of incremental improvement steps, which is dependent on our current infrastructure, some of which may be achievable in the upcoming milestones.

bradbell commented 3 months ago

Perhaps it is possible to use some of the sparstiy code in CppAD to reduce the work ?

vaithak commented 3 months ago

Perhaps it is possible to use some of the sparstiy code in CppAD to reduce the work ?

That would be really helpful, thanks @bradbell. I will try going through the paper along with the implementation in CppAD. What would be the good medium of communication to discuss any doubts I encounter, either in the paper or the implementation?

bradbell commented 3 months ago

I thought that perhaps we could work together to define an interface for a sparsity tool that could be used by any AD package. The input would be things like the DAG for the program. The output would be things like the sparsity pattern and splitting for which forward or backward derivatives can be computed during the same pass.

vaithak commented 3 months ago

Seems like a good idea. @vgvassilev, what do you think about this?

vgvassilev commented 3 months ago

That’d make sense to me.

a-jp commented 3 months ago

Can we also talk about the interface for the user? I could get on with porting my code and use the dense matrix versions of the jac and hess if I knew what was coming, and what would be needed, to move to the sparse interface for both?

Thanks

bradbell commented 3 months ago

@a-jp send me an e-mail message

a-jp commented 3 months ago

I thought that perhaps we could work together to define an interface for a sparsity tool that could be used by any AD package. The input would be things like the DAG for the program. The output would be things like the sparsity pattern and splitting for which forward or backward derivatives can be computed during the same pass.

@vgvassilev and @vaithak just to follow up on my own post above, for which I would welcome your comments. To be clear there are two interfaces. The one quoted here should deal with the implementation. My post above relates to the user experience and interface. One of the aspects of Clad that appeals to me is abstraction. I do not want to know about a DAG as a user. I'd want to call 'hessian' or 'sparse_hessian' and that be the only distinction at the user level. Happy with setup/init calls etc. One aspect of other libraries that while being very powerful is it's almost impossible to simply transition between the two interfaces without extensive understanding of what's going on. As a user this is not a requirement that needs to be enforced. As I say, happy to chat about interfaces from a user point of view and this would allow developers (that use clad, not develop AD libraries) to plan in the background for the transition between the two schemes.

bradbell commented 3 months ago

@a-jp I am not thinking of the user using a DAG, rather clad passing some sort of DAG to a low level library that does the sparsity calculations for it. We are discussing this on the clad channel of the discord server. Perhaps @vgvassilev can set you up with an invite to that server.

vgvassilev / clad

Sparsity patterns for Jacobian and Hessian computations #803