hugobowne / first-multimodal-genAI-app

MIT License
17 stars 12 forks source link

video and audio too sloth #1

Closed emattia closed 3 months ago

hugobowne commented 3 months ago

yeh so the video generation in the streamlit app takes several minutes

@deepfates we're currently using deforum sd

do you know if there's a way to stream the video in so we can load it in chunks or is there a text2video model that generates more quickly on replicate?

for a bit more context, feel free to watch the video in the readme or spin up a codespace and play with the app (side note: in the workshop ill be running through notebooks that lead up to the app)

deepfates commented 3 months ago

So there's two things going on here. One is that video just is kind of compute-heavy and slow, which is just true right now especially in open source. But it should still generate in somethign like a minute rather than four minutes. If you're hitting that kind of delay it's probably because the model is cold so we're having to spin up a machine and load the whole model before generating. I might recommend something like https://replicate.com/lucataco/hotshot-xl which is more likely to be warm, as well as having shorter outputs in general. Doing short clips as outputs will help reduce gen times even when the model is warm. We don't have any streaming video models at this time.

emattia commented 3 months ago

btw these times don't seem too slow in context of Runway's claim about their newest video model Gen-3 Alpha which is ~50% of the deforum SD time we are actually observing with replicate:

image
hugobowne commented 3 months ago

@deepfates this was incredibly helpful and we did this here https://github.com/hugobowne/first-multimodal-genAI-app/pull/6