Open bamos opened 5 years ago
I've thought some more about what I actually want here. I instead want sugar/a wrapper that can take dense inputs, internally make them sparse to make the solve more efficient, and returns/computes the derivatives over some other indexing set that's not necessarily the sparsity map.
This is useful when doing learning with data that has a changing sparsity pattern. Perhaps the most efficient instantiation of this would be for users to specify some known/fixed sparsity patterns and some unknown/dynamic regions that should be inferred so that the static parts aren't unnecessarily recomputed every time this is called into.
For a concrete example consider the problem:
for regression/classification given x
and the task is to estimate some sparse affine space parameterized by {A,b}
that we are forcing to be sparse in some way. We know the sparsity pattern of the objective here and can keep that fixed and know that the sparsity pattern of A
is changing during training so it needs to be recomputed every time, and we'd also like dense gradients w.r.t. all of A
here so we can update it (and we don't care about the derivative w.r.t. the objective)
Regarding the two options I would go for 1. I know it is not as easy but we could have more space for optimizations.
I agree on the issue of flexibility when sparsity pattern changes. We made the OSQP API so that when you update the matrices P
and A
you can specify which nonzero elements change. This helps a lot in cases like the one you mentioned when the sparsity pattern of the cost is fixed and the one of the constraints changes. For example, you could specify the known sparsity pattern of P
and the sparsity pattern of A
as dense. Then, when you update P
and A
you can just change the elements of A
as in this function. Note that the Python interface offers the same functionality as the C one.
Note that if an element of P
and/or A
changes, we still need to refactor the KKT matrix from scratch. We currently do not have a smart way to reuse the factorization and I believe in the general case there is no way to do so.
The current interface requires that the user passes in the CSC shape/indices for
A
andP
explicitly in the constructor and data passed into the forward pass is expected to be the values. Sometimes getting these values and reshaping the data from PyTorch can be cumbersome and it would be more convenient to also provide an interface that is closer to qpth's interface that takesA
andP
as dense inputs. We have a few options here:1) Make the
OSQP
interface handle both cases. It may be slightly messy to have the code for the CSC and dense cases in the same class, but one potential advantage of this is that we could make the backward pass in the dense case slightly more efficient by computingdA
anddP
with a PyTorch batched dense outer product instead of indexing into all of the sparse elements. Also for another micro-optimization we could do here for the forward pass, we could convert the data to a numpy format and pass that directly into thecsc_matrix
constructor without needing to create a dense index map2) Add an
OSQP_Dense
interface that infers the shape/indices and internally calls intoOSQP
(or a renamed version likeOSQP_CSC
). This would be slightly nicer to maintain and could be more reasonable so users understand what interface they're using. The backward pass may not be as efficient if we don't specifically override thedP
anddA
computations to be batched/denseWhat are your thoughts/what do you want to move forward with here? 1) seems nicer so we could optimize for some of the dense operations but I could also understand 2) for simplicity for now.
I've currently hacked in 2) with:
and am running some quick tests with: