Open Huni0318 opened 1 month ago
even though I used the images from the paper, it is not easy to reproduce the result as shown in the paper.
and in stage no, The model requires prev_p and a reference image (ref image). How should I provide these? Isn't the stage no intended for generating the first frame?
Please refer to our paper for the distinction between narrative text and descriptive text. Considering that descriptive text is more suitable as prompts for text-to-image models, we transformed the stories generated by GPT-4 into corresponding descriptive text, which was then fed into our model.
Additionally, when generating the first frame, there is no need to provide prev_p or a reference image. You can either modify the code accordingly, or pass in any image, as it will not be used as a contextual condition in the generation process.
Hello, thank you for providing the code and checkpoints. I would like to generate stories similar to those inferred in the paper, but the results I am getting are as follows. Could you please share the exact inference settings? Here are my current settings.