thu-ml / CRM

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.
https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
MIT License
497 stars 39 forks source link

Upscaled RBG and CCM ,Tile-Based generation #10

Open mr-lab opened 4 months ago

mr-lab commented 4 months ago

Hi , i wanted to ask if let's say i have taken the 256x6 MV images and generated a Higher resolution MV sheet is it possible for CRM to generate a better 3d model with more details ?

My tests ideas are : -Regular Upscale , (CCM won't be that good probably just change resolutions no upscale , still can't figure out if the CCM are used for texturing or generating the 3d mesh ... or both ) or -run a Tile-Based Algorithm: first do a regular CRM image generation RGB and CCM 256x6 then upscale them as follows
Algorithm will split the input image into multiple Tiles and generate RGB and CCM for each tile , then blend them all together into one High resolution MV RGB CCM images .

the Tile code is ready and only need some modifications , it showed some great results with Depth map blending i did some modifications to the code and the models config files and changed the size of the input tensors(array images), the generated RGB and CCM are just garbage using the regular workflow at high resolutions so i can't really tell . what i need to know :

1-will the Decoder Works with resolutions Higher that 256x6 example 512x3,072 ? or the model is just trained on that and wont work ? 2-i read the paper multiple times , but can't understand CCM , can we skip generating those and just use RGB ? are CCM essential for Mesh generation or used just for texturing ? 3-let's say we have extremely detailed Depth maps , like 4k ultra sharp Maps even skin pores will be present... can we in anyway introduce those depth maps into the workflow of CRM ? (this one is very important)

do let me know ,and many thanks in advance , much love and respect for your work , cheers

thuwzy commented 4 months ago

Thank you for your interest in our CRM paper!

  1. The decoder cannot directly work at resolution 512x3,072. However, I will upload the model trained on this resolution 512x3,072 if you can get the upsampling image work.
  2. The CCM is for better geometry and cannot be skipped. I have conducted ablation study on Figure 10 in my paper. CRM without CCM input has worse geometry.
  3. Actually depth map can be equivalently transformed to CCM in my framework. So I think it is highly likely to work. By the way, I think the resolution of CCM is not very important. I think a good pipeline may be generate 256*1536 image and CCM, and then use neural network to upsample the image and simply resize the CCM to be in the resolution of 512x3,072.
mr-lab commented 4 months ago

thank you very much will be waiting for that model I will explore more Point 3 .

mr-lab commented 3 months ago

original model render: image couple of Re-renders image image your work is a blessing to us, those are Re-renders of the RGB to retexture the mesh . more consistency is needed . will move to depth map after that , good depth comes from good RGB. cheers.

mosvlad commented 3 months ago

original model render: image couple of Re-renders image image your work is a blessing to us, those are Re-renders of the RGB to retexture the mesh . more consistency is needed . will move to depth map after that , good depth comes from good RGB. cheers.

Awesome!!! Can you share your result with code?

zz7379 commented 3 months ago

original model render: image couple of Re-renders image image your work is a blessing to us, those are Re-renders of the RGB to retexture the mesh . more consistency is needed . will move to depth map after that , good depth comes from good RGB. cheers.

is this a up-scaled rgb? or rendered mesh?

mosvlad commented 3 months ago

I'm trying to upscale RGB from stage1

This: https://github.com/thu-ml/CRM/blob/3e677cb41d4856dc7c46c5dace00d52336f5c614/run.py#L152 and this: https://github.com/thu-ml/CRM/blob/3e677cb41d4856dc7c46c5dace00d52336f5c614/pipelines.py#L135

For upscale i'm used BSRGAN. https://github.com/cszn/BSRGAN

1) Generate images (256x1536) by stage 1 2) Upscale it by BSRGAN (x2 or 4x) 3) Resize images to (256x1536) 4) Use upscaled and resized images for generate3d

изображение

This steps not make quality improvement like @mr-lab comments.

Another way i tried make upscale for every image generated in step1:

1) Generate image (256x256) 2) Upscale it by BSRGAN (x2 or x4) 3) Resize to original size (256x256) 4) Use upscaled and resized images for stage2

изображение The results are not very good either

Maybe @mr-lab share some more information about his research....

mr-lab commented 3 months ago

@mosvlad we need a decoder that can process higher resolutions 512*3,072 @thuwzy is probably working on that . Now we are working on an alternative ,Transfer CRM results to a 3d blob representing the shape of the subject , then remodel that blob into a model by moving vertex pos until they match target...still long way to see any good results . CRM is the only True 3d generator , times and times again proven to provide consistent multi-view shots , no other project can do . will continue to prepare for a larger decoder .

snowflakewang commented 1 month ago

@thuwzy Hello, I am interested in upscaling the resolution of RGBs to get high-resolution textured meshes. You mentioned that you are working on 512-level decoders. I am curious about the maximum resolution that GPUs (maybe A100/A800) can handle. Is 1024 an acceptable resolution?