haoningwu3639 / StoryGen

[CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

https://haoningwu3639.github.io/StoryGen_Webpage/

MIT License

192 stars 9 forks source link

How can I set enviroment?? #31

Open ParkSungHin opened 1 month ago

ParkSungHin commented 1 month ago

Hello, I'm trying to build an environment with environment.yaml on Windows, and there are a lot of things that aren't running. GPU is also a 4070ti super, so I think the Pytorch and Cuda versions will be different, so how should I approach it??

haoningwu3639 commented 1 month ago

If you encounter problems when using environment.yaml, I suggest you install the key dependencies I recommend, including torch, accelerate, xformers, diffusers, and transformers. Since the diffusers library is updated and iterated quickly, and there may be obvious incompatibilities between previous and later versions, it is recommended to install it according to the version I provide. If you can provide more error information, I will be glad to give more guidance.

ParkSungHin commented 1 month ago

Thanks to this, I finished setting up the environment on Linux. Do I have to download all the files in metadata.json? I have downloaded the data, but it seems difficult to download all of them due to the errors below. ERROR: [youtube] QCJyJup0qcc: Private video. Sign in if you've been granted access to this video ERROR: [youtube] n5p24NNdycc: Video unavailable. This video has been removed by the uploader ERROR: [youtube] eo1TV_1KZsE: Video unavailable

download_videos

Is it correct that the down-style proceeds like the above?

haoningwu3639 commented 1 month ago

Sorry for the late reply, I was on vacation last week. You don't necessarily need to download all the videos in metadata.json, because they may be removed due to YouTube's restrictions. The data in YouTube videos does not account for a large proportion of our StorySalon dataset, so you can focus on the data in the open-source library. You can also search for suitable YouTube videos and use the data processing pipeline we provide to expand the data further.

ParkSungHin commented 3 weeks ago

thank you! So I'm in the process of data processing now. Can I ask you a question because I'm curious during the process?

Did you make sure that only two pairs of the many storybook videos and vtts were extracted from extract.py ?
Can I use the yolov7.pt file used in human_ocr_mask.py as a model that recognizes real people provided by github of yolov7? Or should I fine-tune it to match the picture image of the storybook?

Verg-Avesta commented 3 weeks ago

Sorry, this code was added for the purpose of testing script accuracy. We will update it with a correct version that removes this code.
Yes, you can directly use the pre-trained model.

ParkSungHin commented 2 weeks ago

Thank you, and thanks to you, we've solved that problem! However, the next step, inpaint.py , shows an error like the picture below, so can you tell me how to solve it? sg_problem

haoningwu3639 commented 2 weeks ago

Since the inpainting pipeline is totally borrowed from the implementations of Stable Diffusion, we did not include this part code in our repository, you can follow our README.md to download the related code and dependencies from https://github.com/CompVis/stable-diffusion