taegyeong-lee / Grid-Diffusion-Models-for-Text-to-Video-Generation

Official Code Repository for the paper "Grid Diffusion Models for Text-to-Video Generation", CVPR 2024
18 stars 0 forks source link

Explicit Explanation to run the Code #3

Open GautamV234 opened 3 weeks ago

GautamV234 commented 3 weeks ago

Hi There,

I found your project interesting but couldnt find any requirements.txt as well as instructions to run the code (preprocessing, training, inference, metric computation etc). Can you please share this at the earliest as we want to reproduce your work for one of our applications, thank you.

GautamV234 commented 2 weeks ago

Hi there waiting for your response, thanks in advance.

GautamV234 commented 1 week ago

Hi @taegyeong-lee,

Sincerely requesting you to update the codebase to provide explicit instructions for training for all 3 models as well as inference pipeline. Your current codebase has no details shared and when tried to train as per our understanding of the paper and generating results, they are no where near the results showcased in your CVPR 2024 paper which is astounding. This code base is not providing any insight into the work showcased in the paper and is not reflecting the promised technical advancement. I am looking forward to your clarification regarding the same at the earliest.

taegyeong-lee commented 1 week ago

Sorry for the late reply. We have released the training and preprocess code for the three models. You can review the implementation details through the code. Please send me the generated video results via email, and I will take a look at them (the code is also fine). Additionally, we will soon be providing a checkpoint that integrates with Hugging Face's Diffusers.

If you have any questions regarding the code or need further clarification, please feel free to email me. I will review the implementation details of the code for you. Thanks.

GautamV234 commented 1 week ago

Hi @taegyeong-lee

Here is the document addressing all our concerns and detailed steps taken for model training and the code we wrote for inference due to the lack of available inference code. I have also mailed you the details and have shared the modified source code if needed for replication. Looking forward to your response. Thanks.

taegyeong-lee commented 1 week ago

Thanks, I will check your document. Also, we use Stable diffusion 1.5 instead of Stable diffusion 2.x. Can i check your generated samples because RGB issue can be occur ?

GautamV234 commented 1 week ago

Hi @taegyeong-lee I have added them in the document under the inference pipeline tab, here is an image from the document

image

As can be seen, the key frames dont have scene continuity (colors changing drastically in the second grid of key frames as compared to the first grid) and the images are not of the same fidelity as those shown in the demo (while we use the same prompt i.e "A man is enjoying his boat ride.")

prakashchhipa commented 1 week ago

@taegyeong-lee I also following you work and not able to reproduce the results on video generation. While I don't see inference code and following training code it does not get through the results you shown in paper. If you can answer the and address the concern following the document (https://docs.google.com/document/d/1o4eJmRUTBmtujVE19fWeN8gumcZ-ouVn0bEvbaaijH0/edit?usp=sharing), it will be useful for all.

Waiting for swift response. Thank you.

GautamV234 commented 1 week ago

Hi @taegyeong-lee ,

Waiting for your response. Thank you.

GautamV234 commented 4 days ago

Hi @taegyeong-lee,

Please let me know if you intend to reply to this. It has been over 3 weeks.