jjihwan / SV3D-fine-tune

Fine-tuning code for SV3D
MIT License
56 stars 5 forks source link

Can you share the code to get the video_latent? #6

Open guiyuanyuanbao opened 1 month ago

guiyuanyuanbao commented 1 month ago

Thanks for your work!

In my preparation of the dataset, I have encountered some challenges, especially in getting the video_latent representation. If possible, I would really like to get a sample code that converts the video to the desired video_latent. if possible, I would appreciate it!

jjihwan commented 1 week ago

Sorry for the late reply. Currently I don't have the code about constructing video latents, since the part is done by my coworker. But he said that it is easy to write the code if you refer to "Notes" in README. If you have any questions or complete the code, please inform and share to me :)

ys830 commented 1 week ago

Thanks for your work!

In my preparation of the dataset, I have encountered some challenges, especially in getting the video_latent representation. If possible, I would really like to get a sample code that converts the video to the desired video_latent. if possible, I would appreciate it!

Hello,Did you solve this problem? I also meet the same question. Is the video_latent.pt generated from a single image or from a video? Was it generated using the code from Diffusers or SV3D? I look forward to your response.Thanks~

guiyuanyuanbao commented 1 week ago

Thanks for your work! 感谢您的工作! In my preparation of the dataset, I have encountered some challenges, especially in getting the video_latent representation. If possible, I would really like to get a sample code that converts the video to the desired video_latent. if possible, I would appreciate it!在准备数据集的过程中,我遇到了一些挑战,特别是在获取 video_latent 表示方面。如果可能的话,我真的很想获得一个将视频转换为所需的 video_latent 的示例代码。如果可以的话,我将不胜感激!

Hello,Did you solve this problem? I also meet the same question.你好,这个问题解决了吗?我也遇到同样的问题。 Is the video_latent.pt generated from a single image or from a video? Was it generated using the code from Diffusers or SV3D? I look forward to your response.Thanks~video_latent.pt 是从单个图像还是视频生成的?它是使用DiffusersSV3D的代码生成的吗?期待您的回复。谢谢~

I have reviewed the training code, and during the inference process, there is a part where B*T is noted to have a tensor shape of [21, xx...]. I apologize for not remembering the subsequent dimensions. In my understanding, in conjunction with a closed issue (https://github.com/jjihwan/SV3D-fine-tune/issues/2#issue-2350909425), I believe that after a single image generates a latent, the latents from 21 images are merged into a shape of [21, xx...]. In my understanding, the issue discussed in the aforementioned issue is also caused by this step. Due to certain reasons, I have not actually run this code, so I am not certain if my deduction is entirely correct.

zhaosheng-thu commented 1 week ago

Based on my understanding, SV3D data can be obtained by rendering from multiple views using Blender on Objaverse. If you want to fine-tune SV3D_p, you can omit collecting camera parameters during rendering. However, if you wish to fine-tune SV3D_u, storing the camera parameters is essential.

Regarding rendering, you can refer to the excellent Zero123 source code or check the rendering branch in my repository here (currently without a detailed README).

For generating the .pt file, you can refer to this link. However, I want to let you know that this is rough experimental code, so you might need to spend some time reading through it.

Additionally, if you're planning to fine-tune SV3D_u, my repository might offer useful insights on how to handle camera parameters.

My recommendation is to first complete the preparation of the .pt file before starting the training to reduce CUDA memory pressure, as otherwise, you may encounter OOM (Out of Memory) errors.

Finally, I'd extend my thanks to the author of this repository, as I referenced some of their work while completing mine. Thank you for your work!

MELANCHOLY828 commented 6 days ago

Based on my understanding, SV3D data can be obtained by rendering from multiple views using Blender on Objaverse. If you want to fine-tune SV3D_p, you can omit collecting camera parameters during rendering. However, if you wish to fine-tune SV3D_u, storing the camera parameters is essential.

Regarding rendering, you can refer to the excellent Zero123 source code or check the rendering branch in my repository here (currently without a detailed README).

For generating the .pt file, you can refer to this link. However, I want to let you know that this is rough experimental code, so you might need to spend some time reading through it.

Additionally, if you're planning to fine-tune SV3D_u, my repository might offer useful insights on how to handle camera parameters.

My recommendation is to first complete the preparation of the .pt file before starting the training to reduce CUDA memory pressure, as otherwise, you may encounter OOM (Out of Memory) errors.

Finally, I'd extend my thanks to the author of this repository, as I referenced some of their work while completing mine. Thank you for your work!

Hello, I would like to ask some questions about data processing. What is the specific input data format required for fine-tuning in this code? Could you please provide a reference? Thank you very much!

jjihwan commented 6 days ago

Sorry for the late. I used diffusers, and saved the output of this function.

@zhaosheng-thu Thank you for your kind words! Could you please cite my repository for your great work? Thanks!!