(please do not delete me, leave it for the future) I think people underestimate this paper, and many researchers in this field haven't even noticed it yet.

WHY?

refer to: https://github.com/yuedajiong/super-ai " The first version of the ultimate visual model is completed.

input: single-image (camera-free, incremental-hash-priori, mask-support, single-object-so-far)

output: explicit-stereo-representation (only: stereo; todo: dynamic and interactive.) "

SO: this "splatter-image" paper has "the most powerful stereo-reconstuction/generation task-definition". single-images as condition (IMPORTENT) camera-pose-free (object pose esimated internal) (IMPORTENT) (this paper still used camera-pose, no esitmation, but get from 3d object dataset) mask-support for object (better)

--- I have read almost all papers on this direction, this paper is the closest to we need "ultimate visual model", description in:

of source, all of us still have more TODOs descripted in above image. :-)

To @szymanowiczs: I think there are more than one ways to handle the pose(camera+object) problem. a) camera/object pose esimation. (if e2e, in algorihtm/network; else, other pose programs/netowrks are OK) b) Equivariant/GDL/SE3. that means, user can give any pose image of an object, our algorihtm can generate out the "same/equivariant" object(eg. gs, mesh), just has different direction/rotation. yes, consider as rigid, likes protein-drug-problem. (it looks nobody go on this way so far.)

I like this type of work, "where are we going?(ultimate-task-definition)" first. Genuine liking! genuine admiration!

szymanowiczs / splatter-image

(please do not delete me, leave it for the future) I think people underestimate this paper, and many researchers in this field haven't even noticed it yet. #40