szymanowiczs / splatter-image

Official implementation of `Splatter Image: Ultra-Fast Single-View 3D Reconstruction' CVPR 2024
https://szymanowiczs.github.io/splatter-image
BSD 3-Clause "New" or "Revised" License
795 stars 54 forks source link

can not upload image in demo. #39

Closed yuedajiong closed 3 months ago

yuedajiong commented 4 months ago

as title

szymanowiczs commented 4 months ago

Can you tell me more details? It works ok on my side so I need more info to reproduce

yuedajiong commented 4 months ago
  1. go to your demo page: https://huggingface.co/spaces/szymanowiczs/splatter_image (this page is very slow.)
  2. upload any png image (I tried different images)
  3. no any response. but, if I click the demo images that is OK.

tried: Lion Wizard

szymanowiczs commented 4 months ago

From what I gather it might be something to do with the Gradio version. I've rebuilt the space and it works fine again but I'll keep looking into it.

yuedajiong commented 4 months ago

Hi, GREAT-MASTER: It is OK now; and the generation quality is very good.
(Without a rigorous comparison, I dare not say it is the best, but visually, it does appear to be very good.) A small piece of advice: you can select another background program/or parameter-configure, and higher resolution for better quality, just a little blurry. (caused by foreground_ratio=0.65 in rembg? and resize_to_128?). Of course, this doesn't affect the advantage of your algorithm at all. You can close this issue. Thanks

yuedajiong commented 4 months ago

Dear @szymanowiczs: a question about pose, including camera pose and object pose.

your code:

"pos = torch.bmm(pos#HERE, source_cameras_view_to_world)"

source_cameras_view_to_world: means the 200 preset camera poses? pos#HERE: the pose of object: while infer, the pose is predicted by network; while train, if dataset is 3D objects, the pose is known in loop rendering, right? (if dataset is NOT 3D objects, we need colmap or others pose estimation step? right?)

Thanks

szymanowiczs commented 3 months ago

Yes, if the cameras are not known you need to set them to where you want to render the reconstruction from for visualisation. In this case we have objects, so a loop is a reasonable way to visualise them. For scenes you might want to choose another camera visualisation path, for example figures of 8, small circles, camera going around. Essentially depends on how you want to visualise the reconstruction.

But note that you don't need them to run the network or reconstruction, you only need them for visualisation.