threestudio-project / threestudio

A unified framework for 3D content generation.
Apache License 2.0
6.18k stars 474 forks source link

Issue with NaN error during fine-tuning SHAP-E #184

Open blueangel1313 opened 1 year ago

blueangel1313 commented 1 year ago

Hi ThreeStudio team,

We are currently facing an issue while fine-tuning SHAP-E for our fast text-to-3D model. We have been using the dataset from the repo and the training codebase available at [https://github.com/crockwell/Cap3D/blob/main/text-to-3D/finetune_shapE.py]. However, we consistently encounter a NaN error during the training process.

We have attempted to address the issue by reducing the learning rate, but unfortunately, it has not resolved the problem. As experts in this field, we would greatly appreciate any insights or assistance you can provide to help us overcome this NaN error.

Our goal is to create an exceptional open-sourced fast text-to-3D model by fine-tuning SHAP-E, and we believe that resolving this issue will enable us to achieve that.

Thank you for your time and support. If there is any additional information or details we can provide to facilitate the diagnosis and resolution of this problem, please let us know.

DSaurus commented 1 year ago

Hi, @blueangel1313. Would you mind briefly introducing what SHAP-E is and how you perform fine-tuning with ThreeStudio? Additionally, I'm curious to know if the fine-tuning process is similar to current methods like DreamFusion or Prolificdreamer.

blueangel1313 commented 1 year ago

Thank you for your response @DSaurus, SHAP-E is a text-to-3D model trained by OpenAI, with the model being released here. The standout feature of this model is its ability to generate a 3D model in a mere 1 minute and 30 seconds.

Regarding your query about fine-tuning, another researcher has attempted to fine-tune this model, and you can find the details in the attached codebase. From my understanding, their fine-tuning process is distinct from methods like DreamFusion or Prolificdreamer.

This other researcher informed that the training took approximately 3 days on the full dataset, and 1 day on the smaller human dataset. They used the AdamW optimizer and the CosineAnnealingLR scheduler with an initial learning rate of 1e-5 for fine-tuning SHAP-E. The batch sizes were set to 64 and 256 for SHAP-E and Point-E respectively.

However, an issue they encountered was that SHAP-E often produced NaN outputs, necessitating a restart from saved checkpoints. This could be one of the potential reasons why their fine-tuning process didn't result in significant improvements.

I hope this provides some clarity on your questions. Don't hesitate to reach out if you need further information.