Open mihaibujanca opened 7 years ago
Hello! We're happy you are considering Opt!
Out of curiosity, what are the unknown and residual counts on your problem?
Hi, thanks for the clarification. This is the output from Ceres, but in terms of what will actually be estimated by the optimisation, it's 6 parameters (3 for rotation and 3 for translation) per node for a few hundred "deformation nodes" (starting with about 300-350 and growing indefinitely over time).
The plan is to have one residual per node and estimate 6 parameters per node, hoping to achieve that at frame rate.
Solver Summary (v 1.13.0-eigen-(3.3.90)-lapack-suitesparse-(4.4.6)-cxsparse-(3.1.4)-openmp)
Original Reduced
Parameter blocks 4767 4767
Parameters 28602 28602
Residual blocks 108992 108992
Residual 326976 326976
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR
Threads 8 8
Linear solver threads 8 8
Linear solver ordering AUTOMATIC 349,4418
Schur structure 3,6,6 d,d,d
Cost:
Initial 3.293113e+06
Final 2.295137e-15
Change 3.293113e+06
Minimizer iterations 4
Successful steps 4
Unsuccessful steps 0
Any particular example I should be looking at to better understand what I have to do to implement my routine using Opt? I'm not great with cuda but learning reasonably fast :)
Cotangent Mesh Smoothing (examples/cotangent_mesh_smoothing) is an example of a problem that uses hyperedges (of degree 4, representing a triangle wedge; you'll be using degree 9), so it might be useful to look at that.
CombinedSolver.h is what implements the problem-specific CPU side code in all of the examples, but it leans heavily on code in shared/ (specifically OptImage, OptGraph, and OptUtils.h, which are all part of the helper library I made for the examples and not for Opt proper).
If you'd rather look at a minimal use of the C API, take a look at tests/minimal, which shows a laplacian image example with no wrappers around the Opt and CUDA API calls. The only part really different for your use case is needing to copy over the hyperedge buffers as well as the unknowns.
The plan is still to have some small, fixed "k" unknowns that are used in the contribution to each residual, right?
Thanks, will look into that and come back with further questions if needed :).
Yes, it will always be a small, fixed number of unknowns contributing to each residual, though once I have something that runs fast enough I'll try to vary the number and see how that affects the result.
Quick question - what does the usePreconditioner
function do? Most examples seem to set it to true
It enables Jacobi preconditioning for the linear conjugate gradient solver. It can improve numerical stability, but increases per-iteration compute slightly.
Is this going well? Trying to assess whether I should close this issue.
Had to take a break from this but now I'm back to it. I'd leave it open for a few more days if that's all right, I might be back with a few more questions
Guess this has its answer now. The speedup is significant (about 2-2.5 orders of magnitude), however the available methods don't seem to work as well for my problem so far. Will still try fiddling with Opt to try and get it working
What do you mean by available methods? The library functions? Or GN & LM? If so, what makes LM inappropriate?
Yeah I mean GN & LM. At the moment using LM converges to a local minima that is really far from what I previously got with Ceres, but I am looking into a few options to get it working with LM
I'm currently looking at the Robust nonrigid alignment since the problem is similar. Still getting through that code though it looks a bit more complicated than the rest of the examples
Your Ceres implementation should internally should be using LM (or dogleg, which should probably converge to the same minima). Is the problem that you can't express the same energy function as Ceres for one reason or another?
Not sure about the way it internally works in Ceres (will have a look at some point), but setting options.linear_solver_type = ceres::SPARSE_SCHUR;
gets me to a low cost (but takes quite long, to be fair).
I'm just trying to get the same values from Opt and Ceres for LM at the moment. The energies seem fairly similar but I'm sure there must be some difference
That option sets the linear solve to an exact instead of iterative method. The outer algorithm is still LM (or dogleg). It is possible for LM with two different linear solvers to converge to different local minima (for highly nonlinear problems), but I don't think there is a general way to a priori determine which one will lead to a lower cost.
Is the initial cost for Opt and Ceres the same? Are you initializing the unknowns to the same values?
The initial cost is not the same. The only difference I could find so far is that in Ceres I was initialising the unknowns to 0 and then modifying the deformation nodes at the end, whereas with Opt I am just initialising with the deformation nodes.
"at the end" referring to before the solve, right?
The initial costs should be identical to several decimal places if the data and energies are the same (the only differences would be in the precision of intrinsic operations and the non-associativity of floating point operations, which will be very small),
One mistake I made while working on the initial Opt examples was only partially implementing the cost in Opt and/or Ceres (like just doing the translation term of ARAP). If your energy consists of several terms, you could enable each term one by one to see which term is causing the disparity.
There will be multiple terms (A "data" term using rotation and offset and an ARAP regularisation) but right now I'm just testing with something really simple (weighted sum of offsets) - on both Ceres and Opt.
What I mean is that at the moment in Opt the unknowns are the deformation nodes (specifically just the offset part) whereas in Ceres they are 0 and I'm adding the deformation node offset to that in the energy function. Shouldn't make a difference but I'm modifying it to match what I have in Opt just to be sure I didn't mess it up somewhere.
Guess I'll reopen this since I have some follow-ups. I've created a very simple test and my only reasonable explanation is that somehow the weights must be mismatched - still looking into this. My test is using weights that are dependent on the distance between the neighboring nodes and the source vertex, and the solution is a rigid transformation, so it should be reasonably easy to find a good solution, provided that the weights are indeed matched correctly.
The test works if all weights are set to 1.
So, just to check, am I doing this the right way?
int N = canonical_vertices.size();
m_weights = createEmptyOptImage({N}, OptImage::Type::FLOAT, KNN_NEIGHBOURS, OptImage::GPU, true);
std::vector<float[KNN_NEIGHBOURS]> weights(N);
for(int count = 0; count < N; count++)
{
m_warp->getKNNWeights(canonical_vertices[count], weights[count]);
for(int i = 1; i < graph_vector.size(); i++) //// graph_vector.size is KNN_NEIGHBOURS+1
graph_vector[i].push_back((int)m_warp->getRetIndex()->at(i-1)); // this is 1..size because of how the graph is structured
}
m_weights->update(weights);
m_problemParams.set("Weights", m_weights);
local Weights = Array("Weights", opt_float8, {N}, 6) -- KNN_NEIGHBOURS is 8
local G = Graph("DataG", 7,
"v", {N}, 8,
"n0", {D}, 9,
"n1", {D}, 10,
"n2", {D}, 11,
"n3", {D}, 12,
"n4", {D}, 13,
"n5", {D}, 14,
"n6", {D}, 15,
"n7", {D}, 16)
weightedTranslation = {} -- Lua docs say all values would be initialised to 0
nodes = {0,1,2,3,4,5,6,7}
for _,i in ipairs(nodes) do
weightedTranslation[G.v] = weightedTranslation[G.v] + Weights(G.v)(i) * TranslationDeform(G["n"..i])
end
Energy(LiveVertices(G.v) - CanonicalVertices(G.v) - weightedTranslation[G.v])
Again my understanding of Opt letting me down, but shouldn't all these three be equivalent?
for _,i in ipairs(nodes) do
weightedTranslation[G.v] = weightedTranslation[G.v] + Weights(G.v)(i) * TranslationDeform(G["n"..i])
end
Energy(LiveVertices(G.v) - CanonicalVertices(G.v) - weightedTranslation[G.v])
Energy(LiveVertices(G.v) - CanonicalVertices(G.v))
for _,i in ipairs(nodes) do
Energy( 0 - Weights(G.v)(i) * TranslationDeform(G["n"..i]))
end
for _,i in ipairs(nodes) do
Energy( (LiveVertices(G.v) - CanonicalVertices(G.v))/8 - Weights(G.v)(i) * TranslationDeform(G["n"..i])) -- divided by the number of nodes
end
The first option results in a larger error than the desired threshold (the error is about 0.007), the second one results in 0 cost change and the 3rd one actually works (on the simplest test, still fails on more complex ones). The cost is different for each as well.
Ah, figured out the above in the meanwhile, should have been obvious.
Each energy is squared so one of them is (r1+r2+...)^2
whereas the other is r1^2 + r2^2...
So are your questions answered now?
There was obviously some mismatch in the behavior between what you expected to happen and what happened; which we should be able to mitigate in the future with a combination of improved documentation/warnings/errors. What do you think would have helped the most?
Hmm, I'll need to think about it for a bit - you're already mentioning that the energies are already squared in the documentation and that's also obvious from the cost.
The one question that is still floating around is whether or not the following is correct:
local weightedTranslation = {}
for _,i in ipairs(nodes) do
weightedTranslation[G.v] = weightedTranslation[G.v] + TranslationDeform(G["n"..i]) * Weights(G.v)(i)
end
Energy(LiveVertices(G.v) - CanonicalVertices(G.v) - weightedTranslation[G.v])
i.e I want to have a weighted translation for each vertex pair
The above fails to produce the expected result - I tried testing it with a single vertex (and with multiple vertices, separately) and the cost stays reasonably high even though the solution is straightforward.
On the other hand, the following works for a single vertex, but not for multiple vertices
local weightedTranslation = 0
for _,i in ipairs(nodes) do
weightedTranslation = weightedTranslation + TranslationDeform(G["n"..i]) * Weights(G.v)(i)
end
Energy(LiveVertices(G.v) - CanonicalVertices(G.v) - weightedTranslation)
The two code snippets are essentially identical, except for the variable name, and the initial value (nil in the first snippet, 0 in the second. I don't actually know the behavior of adding to nil offhand, so I'd always use the second one).
I don't really have enough information to go on to help you with the multiple vertices issue. What might help is setting Opt's verbosity to 2 on initialization, and looking at the schedule for the cost function (noting that there will be one cost function per graph/stencil access domain), and seeing if it looks plausibly like what you wanted.
I guess my real question here is whether weightedTranslation
will start at 0 for each of the vertices or if it will cumulate across vertices (in the second case). If so, is there a way to mitigate the issue by initialising an array to 0 for each element and compute the weightedTranslation as in the first example?
weightedTranslation
will start at 0 for every vertex. You are essentially writing an energy template that will execute once per graph edge (or once per unknown in the case of using offset accesses instead of graphs).
I'm currently implementing a paper requiring an optimisation with lots of parameters, running at framerate. I have implemented part of the optimisation routine in ceres already and will soon need to implement an ARAP regularisation term - so I'm expecting your examples would come in handy if I am going to use Opt.
I was wondering
Thank you