jtydhr88 / sd-webui-txt-img-to-3d-model

A custom extension for sd-webui that allow you to generate 3D model from txt or image, basing on OpenAI Shap-E.
GNU Affero General Public License v3.0
262 stars 14 forks source link

feedback + questions + feature requests #9

Open AugmentedRealityCat opened 1 year ago

AugmentedRealityCat commented 1 year ago

I arranged a proper testing session of this extension earlier today during which I took notes that I'm sharing over here. This was tested using the latest DEV branch of the A1111 WebUI, running Windows 10 paired with python: 3.10.6 and torch: 2.0.0+cu118 on a RTX 4090. For TXT-to-3dModel I used "Grand Piano" as a prompt. For the IMG-to_3dModel part, I have been using the PNG at the bottom of this post, which is a C4d render with an alpha channel for transparency.

More information required Tooltips with more information about the use of each parameter would be very useful. This would show when you hover your mouse above any parameter. Same information could be included in the Readme.

GUI Karras slider should be inactive (greyed out or removed) when Karras checkbox is OFF

With or without Karras ? without = 1024 fast steps, overall longer process (51s for piano image test), low quality results with Karras = x steps, overall process shorter (6s for piano image test @ 64 steps), better quality results Any good reason not to use Karras ?

How many karras steps ? without karras = currently hard set to 1024 steps, very long process, 51s for piano image test @64 steps = default, seems to work well ! piano image test in 6s @100 = maximum, more precise color and model, sometimes more holes, piano image test in 10s @1 = minimum, very basic model, piano image test in less than 1s @32 = not very detailed but OK, very fast piano image test in less than 3s

Clip Denoised ? The Clip Denoised option often cuts holes into the model when activated. What is the best case for its use ? I almost always get better results without it.

FP 16 ? FP 16 occasionaly adds bumps and overgrowth where there should not be any. Why should we check that option ? Is it purely to help it run on smaller GPU with less VRAM ?

Bug: identical 3d model saved twice when Karras not activated if Karras is OFF, then the system automatically does a batch of two, but both models are identical.

3d Viewer adjustments requested Change the model's orientation - swapping the Y and Z axis works well for me in C4d. Remove the specular hightlights (and any environment reflection) from the material. It's distracting at best.

3d model exporter There should be information given about the use of vertex colors instead of texture map and UV coordinates Most people are not familiar with vertex colors, but they are supported by most 3d soft, including Blender Maybe a hint or two to help people visualize those colors in different software, and how to convert them to UV maps The basic unit to be used for importing models at a proper scale seems to be 1unit = 1meter (100 cm) (TBC)

Feature Request - connect to 3d model loader + Canvas editor Since the same developper also made this, I would love a link to https://github.com/jtydhr88/sd-3dmodel-loader That's really the missing link to make this extension more convenient The key is to use the viewer to render 3d content as a bitmap to be sent to IMG2IMG or ControlNet as an input Even better would be the ability to send that to the amazing https://github.com/jtydhr88/sd-canvas-editor by some just as amazing programmer !

Feature Request - seed number A new parameter that would allow the user to set a precise seed number, or to set it to -1 for random would be useful. Right now the results are always randomized, and it's not possible to get exactly the same 3d model twice.

Feature Request - Shap-E Parameters I also agree that more of the Shap-E parameters should be exposed. The Sigma min/max/churn are normally used to control the injection of noise in some models. I suppose controlling that might have an influence on getting reproductible results. There is also guidance scale that you identified - is that similar to the CFG parameter for 2d image generation ?

greypiano

jtydhr88 commented 1 year ago

Thanks for such detailed feedback! I will go through them one by one!

AugmentedRealityCat commented 1 year ago

I'll add one more thing that might be useful for users around here. The best settings, for me, and for this test, were:

Karras=ON
Karras steps=64 
Denoised clip=OFF
FP16=OFF 
AugmentedRealityCat commented 1 year ago

And here is a render showing some results from a single batch of Img-to-3dModel based on the PNG I posted at the top of this thread.

1to10