Closed fabienbaradel closed 1 year ago
Hi @fabienbaradel, I apologize for missing your question. I mention scale ambiguity as one of the limitations of our work in the paper. To somehow make the pipeline work, I have to assume that the scale is known. In the case of a method we proposed, this is achieved simply by using known camera parameters with known distances between the cameras. In the line you specified, there is just "TODO" comment on how using 1.80m for all people would be the next step, but I haven't implemented that yet.
Could you explain me how you would estimate this scale factor in your pipeline, it is not really clear to me at the moment?
If I would have to estimate the scale, I would use a completely separate strategy from the one presented in the paper. In general, I believe that the scale should be treated as a separate problem and is not solved in many of 3D pose estimation and mesh regression approaches. I find this paper very interesting for the problem of estimating scale. To apply it for multi-view, one could apply the strategy for all views and then decide to either take the mean estimation or a similar strategy.
Once again, I apologize for not noticing your question previously. If you would like to discuss this more, I am open for it and interested in the scale estimation.
Thanks @kristijanbartol for the explanations!
Hi @kristijanbartol , Thanks a lot for releasing the code, your paper is very interesting. I have a 'naive' question about the scale ambiguity in your pipeline. When we use the 8-point algorithm there is this issue of scale ambiguity about the translation vector. Several times in your codebase you are using a 'scale' variable such as here. But as far as I know you do not really estimate given an initial assumption this scale such as assuming that a person is 1.80m tall. Could you explain me how you would estimate this scale factor in your pipeline, it is not really clear to me at the moment? Thanks a lot for your feedback and response,