huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.35k stars 5.43k forks source link

Integration of ImageBind and StableUnCLIPImg2ImgPipeline for audio2image generation #3444

Open Zeqiang-Lai opened 1 year ago

Zeqiang-Lai commented 1 year ago

Model/Pipeline/Scheduler description

For anyone who need, here is a simple demo to illustrate how to integrate ImageBind and StableUnCLIPImg2ImgPipeline for audio2image generation.

Open source status

Provide useful links for the implementation

See also, https://github.com/Zeqiang-Lai/Anything2Image

import imagebind
import torch
from diffusers import StableUnCLIPImg2ImgPipeline

# construct models
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
)
pipe = pipe.to(device)

model = imagebind.imagebind_huge(pretrained=True)
model.eval()
model.to(device)

# generate image
with torch.no_grad():
    audio_paths=["assets/wav/bird_audio.wav"]
    embeddings = model.forward({
        imagebind.ModalityType.AUDIO: imagebind.load_and_transform_audio_data(audio_paths, device),
    })
    embeddings = embeddings[imagebind.ModalityType.AUDIO]
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("bird_audio.png")
patrickvonplaten commented 1 year ago

Looks cool!

sayakpaul commented 1 year ago

@patrickvonplaten I believe this code example provided here is already a good one, no? Might make sense to add a doc instead of a pipeline or community example IMO.

@Zeqiang-Lai would you be interested in doing that? Happy to help :)

Zeqiang-Lai commented 1 year ago

Oh,sure. I would like to help for that, but where should I start?

sayakpaul commented 4 days ago

I think this could be added to the examples/research_projects directory. Sorry for the late reply.