osqp / osqpth

The differentiable OSQP solver layer for PyTorch
Other
59 stars 5 forks source link

Making the CSC shape/indices optional #2

Open bamos opened 5 years ago

bamos commented 5 years ago

The current interface requires that the user passes in the CSC shape/indices for A and P explicitly in the constructor and data passed into the forward pass is expected to be the values. Sometimes getting these values and reshaping the data from PyTorch can be cumbersome and it would be more convenient to also provide an interface that is closer to qpth's interface that takes A and P as dense inputs. We have a few options here:

1) Make the OSQP interface handle both cases. It may be slightly messy to have the code for the CSC and dense cases in the same class, but one potential advantage of this is that we could make the backward pass in the dense case slightly more efficient by computing dA and dP with a PyTorch batched dense outer product instead of indexing into all of the sparse elements. Also for another micro-optimization we could do here for the forward pass, we could convert the data to a numpy format and pass that directly into the csc_matrix constructor without needing to create a dense index map

2) Add an OSQP_Dense interface that infers the shape/indices and internally calls into OSQP (or a renamed version like OSQP_CSC). This would be slightly nicer to maintain and could be more reasonable so users understand what interface they're using. The backward pass may not be as efficient if we don't specifically override the dP and dA computations to be batched/dense

What are your thoughts/what do you want to move forward with here? 1) seems nicer so we could optimize for some of the dense operations but I could also understand 2) for simplicity for now.


I've currently hacked in 2) with:

class OSQP_Dense(Module):
    def __init__(self,
                 eps_rel=1e-05,
                 eps_abs=1e-05,
                 verbose=False,
                 max_iter=10000):
        super().__init__()
        self.eps_abs = eps_abs
        self.eps_rel = eps_rel
        self.verbose = verbose
        self.max_iter = max_iter

    def forward(self, P, q, A, l, u):
        # TODO: Better batch checks
        if A.ndimension() == 2:
            m, n = A.shape
        elif A.ndimension() == 3:
            n_batch, m, n = A.shape
        else:
            raise RuntimeError("A.ndimension() unexpected")

        P_idx = list(map(lambda I: np.array(I, dtype=np.int32),
                        zip(*product(range(n), range(n)))))
        A_idx = list(map(lambda I: np.array(I, dtype=np.int32),
                        zip(*product(range(m), range(n)))))

        x = OSQP(P_idx, P.shape, A_idx, A.shape)(P.view(-1), q, A.view(-1), l, u)
        return x

and am running some quick tests with:

def main():
    torch.manual_seed(0)
    n, m = 10, 5
    L = torch.randn(n, n, requires_grad=True)
    P = L.t().mm(L)
    q = torch.randn(n, requires_grad=True)
    A = torch.randn(m, n, requires_grad=True)
    l = -10*torch.ones(m, requires_grad=True)
    u = 10*torch.ones(m, requires_grad=True)

    P_idx = list(map(lambda I: np.array(I, dtype=np.int32),
                    zip(*product(range(n), range(n)))))
    A_idx = list(map(lambda I: np.array(I, dtype=np.int32),
                    zip(*product(range(m), range(n)))))

    # P, q, A, l, u = [x.cuda() for x in [P, q, A, l, u]]
    x = OSQP(P_idx, P.shape, A_idx, A.shape)(P.view(-1), q, A.view(-1), l, u)
    print(x)
    print(grad(x.sum(), L, retain_graph=True))

    x = OSQP_Dense()(P, q, A, l, u)
    print(x)
    print(grad(x.sum(), L))
bamos commented 5 years ago

I've thought some more about what I actually want here. I instead want sugar/a wrapper that can take dense inputs, internally make them sparse to make the solve more efficient, and returns/computes the derivatives over some other indexing set that's not necessarily the sparsity map.

This is useful when doing learning with data that has a changing sparsity pattern. Perhaps the most efficient instantiation of this would be for users to specify some known/fixed sparsity patterns and some unknown/dynamic regions that should be inferred so that the static parts aren't unnecessarily recomputed every time this is called into.

For a concrete example consider the problem:

image

for regression/classification given x and the task is to estimate some sparse affine space parameterized by {A,b} that we are forcing to be sparse in some way. We know the sparsity pattern of the objective here and can keep that fixed and know that the sparsity pattern of A is changing during training so it needs to be recomputed every time, and we'd also like dense gradients w.r.t. all of A here so we can update it (and we don't care about the derivative w.r.t. the objective)

bstellato commented 5 years ago

Regarding the two options I would go for 1. I know it is not as easy but we could have more space for optimizations.

I agree on the issue of flexibility when sparsity pattern changes. We made the OSQP API so that when you update the matrices P and A you can specify which nonzero elements change. This helps a lot in cases like the one you mentioned when the sparsity pattern of the cost is fixed and the one of the constraints changes. For example, you could specify the known sparsity pattern of P and the sparsity pattern of A as dense. Then, when you update P and A you can just change the elements of A as in this function. Note that the Python interface offers the same functionality as the C one.

Note that if an element of P and/or A changes, we still need to refactor the KKT matrix from scratch. We currently do not have a smart way to reuse the factorization and I believe in the general case there is no way to do so.