qianqianwang68 / omnimotion

Apache License 2.0
2.07k stars 121 forks source link

A question about blending weight. #40

Open achao2013 opened 8 months ago

achao2013 commented 8 months ago

Why x_i and x_j can share the same blending weights? The depth changes when point k motion from frame i to frame j.

image
achao2013 commented 8 months ago

@qianqianwang68

qianqianwang68 commented 8 months ago

x_i and x_j share the same density (not blending weight), and that's because they correspond to the same canonical 3D point. In our work, all local 3D locations that get mapped to the same canonical 3D location share the same density. Here because we are rendering the 3D motion of a query pixel in frame i, it makes sense to use the blending weights on the ray of the query pixel in frame i, and this holds true whether or not the depth changes when a point moves from i to j

achao2013 commented 8 months ago

x_i and x_j share the same density (not blending weight), and that's because they correspond to the same canonical 3D point. In our work, all local 3D locations that get mapped to the same canonical 3D location share the same density. Here because we are rendering the 3D motion of a query pixel in frame i, it makes sense to use the blending weights on the ray of the query pixel in frame i, and this holds true whether or not the depth changes when a point moves from i to j

image

From the code, blending_weight only related to x_canonical, and x2s_pred also rely on the same blending_weight. There's some confusion here. @qianqianwang68

achao2013 commented 8 months ago

Can you help me to solve my confusion? 1.If a point x_i moved, the sample points in the ray of frame i might not be on the same straight line anymore when they mapped to the canonical volume. Can we keep using the classical volume rendering ? 2.how to ensure the point in the canonical view is effective(e.g. , not overlapped with the points mapped from other view, the same canonical 3D point location can be correspond to different point in different view, which means the u should be space-time point to solve the ambiguity; from the code the u is not a globally consistent “index” for a particular scene point or 3D trajectory across time as the paper described) . @qianqianwang68 @zussini

qianqianwang68 commented 8 months ago

Hi,

It's not exactly correct that blending_weight is only dependant on x_canonical, because x_canonical is obtained from x1s_samples, and when computing the blending weights, the order of the samples on the ray also matters and the order comes from x1s_samples.

"x2s_pred also rely on the same blending_weight" -- It is true that x2s_pred is computed using blending_weight, but here you should interpret it as we are blending the corresponding 3D locations in frame j into a single corresponding 3D location with the blending_weight. This process is similar to volume rendering in NeRF, and the only difference is that here we blend 3D locations instead of colors.

"If a point x_i moved, the sample points in the ray of frame i might not be on the same straight line anymore when they mapped to the canonical volume. Can we keep using the classical volume rendering"

It is true that it is not a straight line in the canonical frame, but as I mentioned at the beginning the rendering process happened in the local space of frame i, where x1s_samples form straight lines, so the rendering process is still valid. Mapping to the canonical volume is just for fetching the corresponding colors and densities, but the actual rendering happens in local frames.

"how to ensure the point in the canonical view is effective(e.g. , not overlapped with the points mapped from other view, the same canonical 3D point location can be correspond to different point in different view, which means the u should be space-time point to solve the ambiguity; from the code the u is not a globally consistent “index” for a particular scene point or 3D trajectory across time as the paper described)"

sorry I don't think I understand the question, why is u not a globally consistent “index”? u is globally consistent index by design via the invertible networks.

achao2013 commented 6 months ago

Thank you for your detailed and patient explanation. Sorry to reply you later. I have understand the previous problems. For the last problem mentioned above, i firstly highlight the related statement in the paper: Note that the canonical coordinate u is time-independent and can be viewed as a globally consistent “index” for a particular scene point or 3D trajectory across time.

From the perspective of the physical world, If the canonical coordinate u is viewed as a particular scene point , it should be defined in one canonical time t because the same position can be different objects in diferent time; If the canonical coordinate u is viewed as a 3D trajectory across time, the canonical coordinate should be Spatiotemporal (e.g., (x,y,z,t)), because the two 3D trajectory of two objects may intersect at one point with no time t.

From the perspective of the model, When we tranform the camera point to canonical coordinate, we use deform_mlp(t, feat_t, x).If we have two different dynamic points x1 in frame t1 and x2 in frame t2 how can the mapping ensure that deform_mlp(t1, feat_t1, x1) is not equal numerically to deform_mlp(t2, feat_t2, x2) ?

@qianqianwang68

achao2013 commented 5 months ago

@qianqianwang68

qianqianwang68 commented 5 months ago

If we have two different dynamic points x1 in frame t1 and x2 in frame t2 how can the mapping ensure that deform_mlp(t1, feat_t1, x1) is not equal numerically to deform_mlp(t2, feat_t2, x2)? That's exactly what the mapping network is optimized to achieve (though there is no guarantee). Given that the mapping network models bijections, if dynamic point x1 is trained to map to its correct canonical point, then it won't be able to map to x2's canonical point which is a different point.

achao2013 commented 5 months ago

ok, thanks. So I think the sentence "the canonical coordinate u is time-independent" is not exactly right, because if u is not in a specific time t, it can be mapped to x1 in frame t1 and x2 in frame t2. Only specified time can ensure that the same physical coordinate in canonical space does not correspond to two different scene points. @qianqianwang68