guoyww / AnimateDiff

Official implementation of AnimateDiff.
https://animatediff.github.io
Apache License 2.0
10.39k stars 852 forks source link

A latest diffuser version is needed! #77

Open ymzlygw opened 1 year ago

ymzlygw commented 1 year ago

Hi, thanks for your good work! But I found that the version of diffusers(0.11) is too old .Now latest diffusers is 0.18. Old diffusers version cause issue when I using it: convert many model.ckpt to diffusers model and do inference can cause architecture error: like

ValueError: unknown mid_block_type : UNetMidBlock2DCrossAttn
czk32611 commented 1 year ago

I think you can add config["mid_block_type"] = "UNetMidBlock3DCrossAttn" after https://github.com/guoyww/AnimateDiff/blob/e2590df10123c11d25c7145ac239902e89e2061c/animatediff/models/unet.py#L468 to allow you to load new models.

This issue happens because old models do not have mid_block_type in their model configs so that they use the default value in animatediff/models/unet.py. However, new models have UNetMidBlock2DCrossAttn as mid_block_type.

LaLaLailalai commented 1 year ago

Hi @ymzlygw, I would like to inquire about your experience using Diffuser v0.18 in a project. Did you able to successfully use this version of Diffuser in your project?

I am currently trying to use the project with Diffuser v0.17, but I have run into several issues with mismatched functions and classes. I was wondering if you have encountered similar issues and if you have any advice on how to resolve them.

oxysoft commented 1 year ago

I was able to use with the latest version of Diffusers. I deleted a bunch of code in this repository for the LoRA, then in VersatileDiffusion I got rid of the xformers stuff, and there are a few calls you have to change to head_to_batch_dim and batch_to_head_dim. For the attention functions, there were too many being created so they refactored them into attention processors. However I can't figure out how to refactor AnimateDiff to use these attention processors, so I copy pasted the old code below and added it to VersatileDiffusion

    def _attention(self, query, key, value, attention_mask=None):
        if self.upcast_attention:
            query = query.float()
            key = key.float()

        attention_scores = torch.baddbmm(
            torch.empty(query.shape[0], query.shape[1], key.shape[1], dtype=query.dtype, device=query.device),
            query,
            key.transpose(-1, -2),
            beta=0,
            alpha=self.scale,
        )

        if attention_mask is not None:
            attention_scores = attention_scores + attention_mask

        if self.upcast_softmax:
            attention_scores = attention_scores.float()

        attention_probs = attention_scores.softmax(dim=-1)

        # cast back to the original dtype
        attention_probs = attention_probs.to(value.dtype)

        # compute attention output
        hidden_states = torch.bmm(attention_probs, value)

        # reshape hidden_states
        hidden_states = self.batch_to_head_dim(hidden_states)
        return hidden_states

    def _sliced_attention(self, query, key, value, sequence_length, dim, attention_mask):
        batch_size_attention = query.shape[0]
        hidden_states = torch.zeros(
            (batch_size_attention, sequence_length, dim // self.heads), device=query.device, dtype=query.dtype
        )
        slice_size = self._slice_size if self._slice_size is not None else hidden_states.shape[0]
        for i in range(hidden_states.shape[0] // slice_size):
            start_idx = i * slice_size
            end_idx = (i + 1) * slice_size

            query_slice = query[start_idx:end_idx]
            key_slice = key[start_idx:end_idx]

            if self.upcast_attention:
                query_slice = query_slice.float()
                key_slice = key_slice.float()

            attn_slice = torch.baddbmm(
                torch.empty(slice_size, query.shape[1], key.shape[1], dtype=query_slice.dtype, device=query.device),
                query_slice,
                key_slice.transpose(-1, -2),
                beta=0,
                alpha=self.scale,
            )

            if attention_mask is not None:
                attn_slice = attn_slice + attention_mask[start_idx:end_idx]

            if self.upcast_softmax:
                attn_slice = attn_slice.float()

            attn_slice = attn_slice.softmax(dim=-1)

            # cast back to the original dtype
            attn_slice = attn_slice.to(value.dtype)
            attn_slice = torch.bmm(attn_slice, value[start_idx:end_idx])

            hidden_states[start_idx:end_idx] = attn_slice

        # reshape hidden_states
        hidden_states = self.batch_to_head_dim(hidden_states)
        return hidden_states

With this it should be enough to get you up and running

ykk648 commented 1 year ago

updated to diffusers 0.20.1, note that codes have been reconstruced by me and was designed not for beginners 🙂https://github.com/ykk648/AnimateDiff