Open MELANCHOLY828 opened 1 week ago
It seems your error is fundamentally "runtime error" caused in here. Ignore the request error and find what cause runtime error in your log.
data format should be .pt
files. Please refer to "Notes" in the README file!
- It seems your error is fundamentally "runtime error" caused in here. Ignore the request error and find what cause runtime error in your log.
- data format should be
.pt
files. Please refer to "Notes" in the README file!
Hello, I have provided a screenshot of the part of the code that reads video_latent, as well as the corresponding file path format. I would like to ask if the code should be reading orbit_frame.pt instead of video_latent.pt? This is because when I run for k in data.datasets:, it doesn't execute successfully.
You right. The video_latent.pt files should be changed as orbit_frame.pt. Sorry for my mistake, and I will change the README now. Thanks!
You right. The video_latent.pt files should be changed as orbit_frame.pt. Sorry for my mistake, and I will change the README now. Thanks!
Hello, we are currently trying to reproduce your fine-tuning work on the objaverse dataset. The input image size is 576x576, and the dimension of the video_latent.pt is [21, 4, 72, 72]. All other configurations are the same as those provided in your code. We used two A100 and two A6000 GPU, but we still encountered OOM errors. The README mentions that you ran it on a single A6000. Could you please provide some troubleshooting suggestions?
It seems really strange. What's your batch size and dtype?
Also try to run in a single GPU by setting cuda visible devices
这看起来确实很奇怪。您的批次大小和数据类型是多少?
还可以尝试通过设置 cuda 可见设备在单个 GPU 中运行
Hello! Thank you very much for your response. I have a question regarding fine-tuning: when fine-tuning, does the latent input to the network always include latents from all views (21 in total) each time? I’m also not quite sure why there are multiple directories under the input path, such as 000-000, 000-001. Are the contents inside these directories completely identical? Looking forward to your reply.
Each latent should contain all 21 views, as SV3D generates 21 frames at once. It differs with other NVR model such as zero123 that generates 1 frame at once.
So each folder represents 1 datapoint(21 frames). The .pt file will be the groundtruth and .png file will be the input frame.
It seems really strange. What's your batch size and dtype?
Also try to run in a single GPU by setting cuda visible devices
We used the same sv3d_p.yaml as you, with batch_size = 1. And the dtype of input x is float32
It seems really strange. What's your batch size and dtype?
Also try to run in a single GPU by setting cuda visible devices
We used the same sv3d_p.yaml as you, with batch_size = 1. And the dtype of input x is float32
Did you try fp16 or bf16? Also try to install accelerate!
Did you try fp16 or bf16? Also try to install accelerate!
Thank you very much for your prompt response!
I used the following code to load the VAE model.
vae = AutoencoderKLTemporalDecoder.from_pretrained("/data/yisi/mywork/SV3D-fine-tune/cheeckpoints/stable-video-diffusion-img2vid-xt/vae").to("cuda")
Here are the files in my directory.
This is the config.json
I'm not quite sure if it is using fp16. Could you take a look and see if there's anything wrong here?
Ah I meant that I’m wondering whether you use FP16 or FP32 during training, not for generating latents.
Did you detach the gradient before saving the latents after decode? I think that's a possible case.
generating
During training,I ues precision:'16-mixed
, the same as sv3d_p.yanl
I alse use detach before saving the latents like this:
Then I have no more idea with your results.. :( Please inform me if you solve the problem
Then I have no more idea with your results.. :( Please inform me if you solve the problem
I've been following your workflow except for the data processing part. Could you share some of your processed latent data and images? It would really help me troubleshoot the issue.
My email : si.yi@smail.nju.edu.cn. Looking forward to hearing from you!
I really want so, but the server I did this project is already dead so I can't find the latents. sorry :(
I really want so, but the server I did this project is already dead so I can't find the latents. sorry :(
I'm really sorry to hear that. ≧ ﹏ ≦ During the debugging process, we found that although we set precision: '16-mixed' in the .yaml file, the weights in the network remained in float32 because both the input latent.pt and the pre-trained sv3d_p.safetensors have a dtype of float32. Could this be the reason for the OOM? Did you use float16 dtype for your data during training?
No I used fp16 mixed precision training, so it is natural that the model and input are processed in fp32. But I recommend you trying to use fp16 rather than fp16-mixed for debugging
Also since you have two or more gpus, try to use deepspeed for model parallelization! :)
Hello, I would like to ask about an issue I encountered while running train_sv3d.py for pre-training; it seems to be related to network problems. Could you please advise on possible solutions?
Additionally, regarding the training data, what should the data format be? Would it be possible to provide a sample directory structure for reference? Thank you very much!