pj4533 / Pokora

A 100% Local AI Video Creation Platform using Stable Diffusion in a Native SwiftUI Interface
MIT License
49 stars 7 forks source link

Adding generative effect type #86

Closed pj4533 closed 1 year ago

pj4533 commented 1 year ago

This implements #59

The idea is to use image2image, but rather than based on the underlying video frame (calling this a 'direct' effect), a 'generative' effect is based on the previous index processed frame. In my CLI I used a slower frame rate and ffmpeg to interpolate to get a nice look, so not sure exactly how I'll finish this, but the groundwork is there.

Todo:

pj4533 commented 1 year ago

I am currently blocked on getting the output I really want.

In my old CLI code I used ffmpeg and it's motion interpolation. I would render fewer frames with stable diffusion, and tell ffmpeg that the source PNGs were a lower frame rate, and tell the motion interpolation that I wanted a higher frame rate. This would tell ffmpeg to interpolate the intermediate frames. Giving a smooth output where the stable diffusion frames blend nicely from one into another.

I am not sure how to do this best in swift code though. I was hoping for some Core Image api that would do it, but have found nothing as of yet. I guess my best case for keeping things simple would be to wrap ffmpeg somehow?

I could tell generative effects to render fewer frames, since all that really matters is the starting frame used for the generative effect. Then I just need to fill up the rest of the array of frames for that given effect.

Then I take those PNGs, already on disk, and write swift code that generates a mp4 just as I did before. Then extract the frames from that mp4 and put those URLs in the effect? That might work, but is messy.

NOTE: can ffmpeg output PNGs directly, rather than the mp4?

pj4533 commented 1 year ago

The real root of the "problem" is that stable diffusion using image 2 image needs a high enough strength value (enough steps of processing) or the result image is too noisy.

However,in my experimentation, it also needs different seed values or it also just ends up with noise. (Not sure why this is exactly, perhaps I need to learn more about this bit)

So the combination of a fairly high strength and different seed, leads to images that are not similar enough to be in a coherent animation at high frame rates.

The above method gets around this by generating fewer frames (ie a lower frame rate) and the interpolating the intermediate frames, leading to a smoother video output, but still with the generative AI madness that I desire. Lol.

pj4533 commented 1 year ago

Other thoughts:

pj4533 commented 1 year ago

This might be the way.

https://github.com/arthenica/ffmpeg-kit

pj4533 commented 1 year ago

Continuing experiments not 100% sure on the interpolation route. Figuring it out. Some other findings:

pj4533 commented 1 year ago

After more experimentation, I think the ffmpeg interpolation route is useful, but an optimization. With the right rotate/zoom values, and a good prompt, I can get interesting output at full frame rate. It's flickery, but thats not always a bad thing. Going to finish up this PR, then create a new ticket to implement FFMpegKit.

pj4533 commented 1 year ago

First test render went good. Immediate to dos:

pj4533 commented 1 year ago

Check the -1 generative piece with the effect before it...does it do that right?

pj4533 commented 1 year ago

Did a test render using generative effect, it works well enough to merge this PR. will need to add some more issues to cover items not included here.