Stereo Vision (Representation): Where are we coming from? Where are we going?

yuedajiong commented 1 year ago

If we, as prophets, knew the ultimate form of stereoscopic vision, especially the ultimate representation of three/four/five/...-dimensionality, how would we rethink the considerable time and effort spent today on various trending/hot but possibly transitional technologies?

point cloud/gauss-points, surface/mesh, sdf/udf, volume, nerf, function-based, ... implicit/explicit representation: 2.5D, 3D, 4D, 5D, ... static/dynamic(time) discrete/dontinuous different physicial attributes (rigid-body/soft/particle); surface/no-clean-surface; transparent/translucent/opaque; ....) light/material intractive/not heavy-reconstruciton, image-condition-generation camera-pose/free ... ...

as a layman: from mesh to nerf, from nerf* to gs, ... focus on speed, focus on quality, focus on dynamic, focus on interactive, ... from a to b, form b to c, where is z?

I found: what is the ultimate stereo representation is the FIRST problem, at least, we should consider "for the majority of situations": P1) explicit suface or volume | interactivie: for physicial interaction, such as in metaverse，in game, in 3d-interactive-show, ... (not-only purely visually ...） P2) dynamic/time: for dynamic-movie/video generation, live-digital-show, ... P3) rigid-body first, ideally compatible with diverse physical properties: interactive rigid, purely-visual-object(anything in the distant background in a movie or game, non-rigid, ... P4) quality, no-UE5 quality, no momey, just an academic exploration, but commercial toy. even need LOD, ... P5) speed: pure-render without differentable > non-differentable render in editor > differentable render in infer (fit-mode, or gen-mode) > differenntable render in train(gen-mode) P6) single-image gen ... P7) camera-pose automaticlly (algorithm-internal) P8) lighting-able, (not only model generation(under differnt lights), but also model using(for different lighting+materials))... P9) ...

NeRF: including Instant***, still slow, non-editable, not-interactive, ... GS: weak-surface, not-interactive, no-lighting, ...

"Yes, as a layman in stereo vision and computer graphics, I am not only reflecting on the perplexing situation myself, but also reading some papers and articles, such as '2308.02751 NeRFs: The Search for the Best 3D Representation'."

NO ANSWER, SO FAR。
I'm in pain, can't go on. --- it's just a joke.

What is the ultimate stereo representation ??? Where are we coming from? Where are we going?

HAVE ANSWER! Return to the place where the demiurge was born: select/design the ‘best’ stereo representation firstly, prioritize meeting more requirements based on priority as much as possible, rather than solely focusing on speed. for example: (iff on GS direction) flat-cloud-gauss-pointS and flat-single-guass-point to get an approximate surface (done) learn from UV-map instead of SH for complex texture, and lighting-able, ... (todo, ideas so far) ......

jaco001 commented 1 year ago

There are no one holy grail. If you study all things... GL. If you search for optimal way/road/process for your case - first define your case/problem. GS are good at baked raytracing propetits. Thats all. You can manipulate GS or even change light, but at this level other way can be good too. My case is reconstruct hair, skin, flesh in general. GS is a best that I can find for now. :)

yuedajiong commented 1 year ago

@jaco001 Thanks.

The human reconstruction is one of my important tasks.

For hair, I tend to use 'generation' not 'reconstruction', people wanted is hair-like somthing, but not the special hair, most of the time. For face, most of the time, we want the face of special person, like Sophie Marceau, Einstein, ..., this tasks includes both recon and gen.

There is a paper from Nvidia, you can search by keywords: Nvidia Admm Hair Interactive. The effect is impressive to me.

Yes, not only quality, but also INTERACTIVE.

So far, I did not find any similar quality by AI.

How to implement the lighting, interaction in GS, it is still an OPEN question.

1) different evn lighting, act on that SH coefficients on GS points. 2) the surface indrectly represented by GS mean and scale and rotation, then do collision-detection, ... even though GS repsented by POINTS, in essence, NO shape. 3) similar with SMPL(X), we need to extend GS for motion, drive an action like this: NewGS = f_drive(StandardGS, MotionMatrix).

graphdeco-inria / gaussian-splatting

Stereo Vision (Representation): Where are we coming from? Where are we going? #478