guoyww / AnimateDiff

Official implementation of AnimateDiff.
https://animatediff.github.io
Apache License 2.0
10.26k stars 842 forks source link

Can you change the uploaded v3 motion model to have a blank 'animatediff_v3' key (or something akin to that), so it could be automatically distinguished in code in other implementations? #239

Open Kosinkadink opened 9 months ago

Kosinkadink commented 9 months ago

The architecture between v1 and v3 motion models appears to be identical, but v3 models use groupnorm inflation in the config while v1 does not. Since there is no way to distinguish between the two, the only solid way right now to tell the difference is to 1) look at the filename, which is finnicky because if a user renames it, it would not longer be be marked as v3, and 2) the filesize is slightly larger, but that might just be an accident and cause issues later on if I take this as a constant.

It would be a great help if there was a dummy key in the motion model, like 'animatediff_v3' that would just be a tensor of length one with a 0.0 or something, just so that the key can be located and used. In ControlNet, ControlLora use this sort of dummy key to be easily distinguished for outside applications.

Thanks! If your published model could implement this for the v3 models, that would be very helpful.

Kosinkadink commented 9 months ago

Actually, looks like the v3 models reuse the 32 PE length from the v2 model, which is different from the 24 PE of the v1 model. And since v3 does not have midblocks, there IS a way to tell them apart by using this information. The change above is not necessary to distinguish between v3/current models, but it would be a great help if moving forward the motion models had some sort of identifying key inside.

guoyww commented 9 months ago

Hi @Kosinkadink , thanks for your suggestion! I have added a dummy key animatediff_config to each checkpoint (the uploading to hf is still in progress). It includes the model name and some necessary configuration, which now looks like this:

state_dict = torch.load("v3_sd15_mm.ckpt")
state_dict["animatediff_config"] = {
    'name': 'v3_sd15_mm',
    'inference_config': {
        'unet_additional_kwargs': {
            ...
        },
        'noise_scheduler_kwargs': {
            ...
        }
    }
}

Let me know if this solution works:)

Kosinkadink commented 9 months ago

Hey @guoyww , thanks for taking a look, however, I should have provided more info initially - having a dictionary key breaks safe loading, which is weight_only loading. Basically, since tensor models are so widely distributed, community code attempts to load only weights from checkpoints, as if they were safetensors, to avoid arbitrary code execution if a malicious actor tried to distribute malware or other nefarious code. Since the animatediff_config is not a tensor, torch's weights_only load fails - a normal dictionary is not one of the supported types for that. The quick fix on my end would be to disable weight_only loading for motion modules, but I'd like to avoid that if possible to prevent any bad models that might be created to take advantage of the unsafe loading. It would be mighty convenient to use the dictionary provided, but I think it would be for the best if we try to keep support weight_only loading.

To work with weight only loading (and to allow finetunes to release .safetensor versions of models if they'd like without losing the identifying info), the value of the key added would also need to be a tensor/int/float, and unfortunately nothing but a tensor/int/float. And the key itself would need to include the identifying info. Or, there could be multiple identifying keys with tensor values that have info in the form of a tensor, or a torch-friendly int or float.

For example, the version identification could be something like state_dict["animatediff_v3"] = <some tensor value or an integer/float, even if it's empty/contains just one float/integer>. Even though this does not include any additional info, just being able to check for the animatediff_v3 or some other key name that can uniquely identify the model so that it could be matched with a specific behavior in code would be more than enough!

The code to load a checkpoint with weights_only is torch.load(ckpt, map_location=device, weights_only=True), and requires a sufficiently updated version of pytorch to support weights_only=True. Trying to load the model through that command would be a good way to know if the datatypes included would work for weights_only loading. Torch determines if a type is good for safe unpickling by using _get_allowed_globals() from torch/_weights_only_unpickler.py.: image

Other values being stored might be helpful, but honestly the minimum that would be needed to help identify models would be to have an easily parsable key with some common prefix/suffix (like animatediff_) with a unique portion attached to the end (like v3) so that community code can assign the appropriate behavior unique for that model. So for a future SDXL model, the key could be called animatediff_sdxl_v1, for example. While technically weights_only loading supports other types such as OrderedDict, as long as the values of that OrderedDict are also weights_only compatible, I think we should keep it simple so that .safetensors versions of models can also be used fine.

Kosinkadink commented 9 months ago

If you rollback the changes of the huggingface models to before the dictionary was added, that will fix the issue for people downloading the models currently while you implement the weights_only-compatible value (or, we don't worry about the identifying keys until next release). SparseCtrl models and the provided lora would also need that dictionary removed to support weights_only loading.

Kosinkadink commented 9 months ago

Also, it would be helpful if there can be .safetensors versions of the models published! (But the priority rn is to rollback the dictionary addition to the models)

Kosinkadink commented 9 months ago

Hey, could you please roll back the model changes on huggingface to remove the dictionaries from all of them? They are not compatible with safe loading in all community implementations of both AnimateDiff and StableDiffusion, and every day since this change I need to point people to the initial release version.

I don't mean to pester, but these models have been in a broken state longer than they have been in a working one - you dont even have to upload new models to fix it, if you just roll back to the initial v3 commit on huggingface it'll fix it!

guoyww commented 9 months ago

Hey, could you please roll back the model changes on huggingface to remove the dictionaries from all of them? They are not compatible with safe loading in all community implementations of both AnimateDiff and StableDiffusion, and every day since this change I need to point people to the initial release version.

I don't mean to pester, but these models have been in a broken state longer than they have been in a working one - you dont even have to upload new models to fix it, if you just roll back to the initial v3 commit on huggingface it'll fix it!

Hi @Kosinkadink , I just rolled back the huggingface models to the original ones without the dummy key. Sorry for the delay cause I was busy with other issues. Maybe later we could come up with another solution for model identification.