RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1024 and 768x320)

I modified the paths in the configuration file to point to my local directories (UBC Fashion Video dataset) and started the training process. However, an error occurred during the process.
/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
### Train Info: train stage 1: image pretrain ###
Some weights of the model checkpoint were not used when initializing ReferenceNet: 
 ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight, up_blocks.3.attentions.2.proj_out.bias, up_blocks.3.attentions.2.proj_out.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight']
12/20/2023 01:40:44 - INFO - root - ***** Running training *****
12/20/2023 01:40:44 - INFO - root -   Num examples = 500
12/20/2023 01:40:44 - INFO - root -   Num Epochs = 480
12/20/2023 01:40:44 - INFO - root -   Instantaneous batch size per device = 4
12/20/2023 01:40:44 - INFO - root -   Total train batch size (w. parallel, distributed & accumulation) = 4
12/20/2023 01:40:44 - INFO - root -   Gradient Accumulation steps = 1
12/20/2023 01:40:44 - INFO - root -   Total optimization steps = 60000

  0%|          | 0/60000 [00:00<?, ?it/s]
Steps:   0%|          | 0/60000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 629, in <module>
    main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config)
  File "train.py", line 492, in main
    referencenet(latents_ref_img, ref_timesteps, encoder_hidden_states)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/AnimateAnyone-unofficial/models/ReferenceNet.py", line 1005, in forward
    sample, res_samples = downsample_block(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 1086, in forward
    hidden_states = attn(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 315, in forward
    hidden_states = block(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/AnimateAnyone-unofficial/models/ReferenceNet_attention.py", line 199, in hacked_basic_transformer_inner_forward
    attn_output = self.attn2(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 417, in forward
    return self.processor(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 1023, in __call__
    key = attn.to_k(encoder_hidden_states, scale=scale)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/lora.py", line 224, in forward
    out = super().forward(hidden_states)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1024 and 768x320)

Steps:   0%|          | 0/60000 [00:05<?, ?it/s]
[2023-12-20 01:40:55,416] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 412807) of binary: /home/user/miniconda3/envs/animateanyone-unofficial/bin/python
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/animateanyone-unofficial/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-12-20_01:40:55
  host      : gpuserver
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 412807)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
guoqincode / Open-AnimateAnyone

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1024 and 768x320) #36