Open GautamV234 opened 3 weeks ago
Hi there waiting for your response, thanks in advance.
Hi @taegyeong-lee,
Sincerely requesting you to update the codebase to provide explicit instructions for training for all 3 models as well as inference pipeline. Your current codebase has no details shared and when tried to train as per our understanding of the paper and generating results, they are no where near the results showcased in your CVPR 2024 paper which is astounding. This code base is not providing any insight into the work showcased in the paper and is not reflecting the promised technical advancement. I am looking forward to your clarification regarding the same at the earliest.
Sorry for the late reply. We have released the training and preprocess code for the three models. You can review the implementation details through the code. Please send me the generated video results via email, and I will take a look at them (the code is also fine). Additionally, we will soon be providing a checkpoint that integrates with Hugging Face's Diffusers.
If you have any questions regarding the code or need further clarification, please feel free to email me. I will review the implementation details of the code for you. Thanks.
Hi @taegyeong-lee
Here is the document addressing all our concerns and detailed steps taken for model training and the code we wrote for inference due to the lack of available inference code. I have also mailed you the details and have shared the modified source code if needed for replication. Looking forward to your response. Thanks.
Thanks, I will check your document. Also, we use Stable diffusion 1.5 instead of Stable diffusion 2.x. Can i check your generated samples because RGB issue can be occur ?
Hi @taegyeong-lee I have added them in the document under the inference pipeline tab, here is an image from the document
As can be seen, the key frames dont have scene continuity (colors changing drastically in the second grid of key frames as compared to the first grid) and the images are not of the same fidelity as those shown in the demo (while we use the same prompt i.e "A man is enjoying his boat ride.")
@taegyeong-lee I also following you work and not able to reproduce the results on video generation. While I don't see inference code and following training code it does not get through the results you shown in paper. If you can answer the and address the concern following the document (https://docs.google.com/document/d/1o4eJmRUTBmtujVE19fWeN8gumcZ-ouVn0bEvbaaijH0/edit?usp=sharing), it will be useful for all.
Waiting for swift response. Thank you.
Hi @taegyeong-lee ,
Waiting for your response. Thank you.
Hi @taegyeong-lee,
Please let me know if you intend to reply to this. It has been over 3 weeks.
Hi There,
I found your project interesting but couldnt find any requirements.txt as well as instructions to run the code (preprocessing, training, inference, metric computation etc). Can you please share this at the earliest as we want to reproduce your work for one of our applications, thank you.