tyshiwo1 / DiM-DiffusionMamba

The official implementation of DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
121 stars 6 forks source link

RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

Closed lihao-doc closed 2 weeks ago

lihao-doc commented 2 weeks ago

[2024-06-27 08:23:37,448] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible I0627 08:23:38.489170 136925989832512 eval_ldm_discrete.py:140] Process 0 using device: cuda Counting ImageNet files from assets/datasets/ImageNet Finish counting ImageNet files Missing train samples: 1280444 < 1281167 1000 classes cnt[:10]: tensor([1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300.]) frac[:10]: [tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010)] prepare the dataset for classifier free guidance with p_uncond=0.1 2024-06-27 08:23:41,511 - _cpp_lib.py - WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.1.1) Python 3.9.19 (you have 3.9.19) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details 2024-06-27 08:23:56,201 - eval_ldm_discrete.py - load nnet from workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet_ema.pth ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:341 in │ │ │ │ 338 │ │ 339 │ │ 340 if name == "main": │ │ ❱ 341 │ app.run(main) │ │ 342 │ │ │ │ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:308 in run │ │ │ │ 305 │ callback = _init_callbacks.popleft() │ │ 306 │ callback() │ │ 307 │ try: │ │ ❱ 308 │ _run_main(main, args) │ │ 309 │ except UsageError as error: │ │ 310 │ usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode) │ │ 311 │ except: │ │ │ │ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:254 in _run_main │ │ │ │ 251 │ atexit.register(profiler.print_stats) │ │ 252 │ sys.exit(profiler.runcall(main, argv)) │ │ 253 else: │ │ ❱ 254 │ sys.exit(main(argv)) │ │ 255 │ │ 256 │ │ 257 def _call_exception_handlers(exception): │ │ │ │ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:337 in main │ │ │ │ 334 │ config = FLAGS.config │ │ 335 │ config.nnet_path = FLAGS.nnet_path │ │ 336 │ config.output_path = FLAGS.output_path │ │ ❱ 337 │ evaluate(config) │ │ 338 │ │ 339 │ │ 340 if name == "main": │ │ │ │ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:156 in evaluate │ │ │ │ 153 │ nnet = accelerator.prepare(nnet) │ │ 154 │ logging.info(f'load nnet from {config.nnet_path}') │ │ 155 │ if (config.nnet_path is not None) and (config.sample.algorithm != 'dpm_solver_upsamp │ │ ❱ 156 │ │ accelerator.unwrap_model(nnet).load_state_dict(torch.load(config.nnetpath, map │ │ 157 │ else: │ │ 158 │ │ accelerator.unwrap_model(nnet) │ │ 159 │ │ │ │ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/torch/nn/modules/module.py:215 │ │ 2 in load_state_dict │ │ │ │ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │ │ 2150 │ │ │ │ 2151 │ │ if len(error_msgs) > 0: │ │ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │ │ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │ │ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │ │ 2155 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]).

tyshiwo1 commented 2 weeks ago

It seems that you load the weights of model trained on 256 \times 256 with the 512 \times 512 config. Can you offer me the config you used?

Since you get the message load nnet from workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet_ema.pth, have you downloaded the checkpoint with the correct resolution ($256 \times 256$)?

lihao-doc commented 2 weeks ago

imagenet256_H_DiM.py mport ml_collections

def d(**kwargs): """Helper of creating a config dict.""" return ml_collections.ConfigDict(initial_dictionary=kwargs)

def get_config(): config = ml_collections.ConfigDict()

config.seed = 1234
config.pred = 'noise_pred'
config.z_shape = (4, 32, 32)

config.autoencoder = d(
    pretrained_path='assets/stable-diffusion/autoencoder_kl_ema.pth'
)

# config.gradient_accumulation_steps=2 # 1
config.max_grad_norm = 1.0

config.train = d(
    n_steps=750000, # 300000
    batch_size=768, 
    mode='cond',
    log_interval=10,
    eval_interval=5000,
    save_interval=25000, # 50000
)

config.optimizer = d(
    name='adamw',
    lr=0.0002, 
    weight_decay=0.03, 
    betas=(0.99, 0.99),
    eps=1e-15,
)

config.lr_scheduler = d(
    name='customized',
    warmup_steps=5000, 
)

learned_sigma = False
latent_size = 32
in_channels = 4 # 3
config.nnet = d( 
    name='Mamba_DiT_H_2',
    attention_head_dim=1536//1, num_attention_heads=1, num_layers=49, 
    in_channels=in_channels,
    num_embeds_ada_norm=1000,
    sample_size=latent_size,
    activation_fn="gelu-approximate", #"gelu-approximate",
    attention_bias=True,
    norm_elementwise_affine=False,
    norm_type="ada_norm_single", #"layer_norm",
    out_channels=in_channels*2 if learned_sigma else in_channels,
    patch_size=2, 
    mamba_d_state=16,
    mamba_d_conv=3, 
    mamba_expand=2,
    use_bidirectional_rnn=False,
    mamba_type='enc',
    nested_order=0,
    is_uconnect=True,
    no_ff=True,
    use_conv1d=True,
    is_extra_tokens=True,
    rms=True, 
    use_pad_token=True,
    use_a4m_adapter=True,
    drop_path_rate=0.0, 
    encoder_start_blk_id=1, 
    kv_as_one_token_idx=-1,
    num_2d_enc_dec_layers=6,
    pad_token_schedules=['dec_split', 'lateral'],
    is_absorb=False, 
    use_adapter_modules=True,
    sequence_schedule='dilated',
    sub_sequence_schedule=['reverse_single', 'layerwise_cross'],
    pos_encoding_type='learnable', 
    scan_pattern_len=4 -1,
    is_align_exchange_q_kv=False, 
    is_random_patterns=False, 
) 
config.gradient_checkpointing = False

config.dataset = d(
    name='imagenet',
    path='assets/datasets/ImageNet',
    resolution=256,
    cfg=True,
    p_uncond=0.1,
)

config.sample = d(
    sample_steps=50,
    n_samples=50000,
    mini_batch_size=25,  # the decoder is large
    algorithm='dpm_solver',
    cfg=True,
    scale=0.4,
    path=''
)

return config

downloaded the checkpoint with https://drive.google.com/drive/folders/1TTEXKKhnJcEV9jeZbZYlXjiPyV87ZhE0?usp=sharing

lihao-doc commented 2 weeks ago

ImageNet 64x64: Put the standard ImageNet dataset (which contains the train and val directory) to assets/datasets/ImageNet. ImageNet 256x256 and ImageNet 512x512: Extract ImageNet features according to scripts/extract_imagenet_feature.py.

Currently, I have downloaded the ImageNet dataset and placed it according to the prescribed path, but I have not processed it yet. Is it necessary to preprocess the dataset into a 256x256 format? Or does the program automatically handle the dataset formatting?

tyshiwo1 commented 2 weeks ago

There is no necessary to preprocess the datasets where images are smaller than $256 \times 256$. Although this requires additional training time and GPU memory, it should not be too much. For images larger than $512 \times 512$, you can preprocess the datasets like this, which saves a lot of training cost.

tyshiwo1 commented 2 weeks ago

The path of the image samples in our imageNet dataset is like assets/datasets/ImageNet/train/n07747607/n07747607_61484.JPEG

tyshiwo1 commented 2 weeks ago

I'm sorry I accidentally hit the edit key on your reply of uploading the config.😂

After reading, I think your provided config is correct. However, your checkpoint should not contain additional_embed with shape torch.Size([1, 1026, 1536]). Have you really downloaded the correct checkpoint? You may try to load the nnet.pth to check whether the evaluation can successfully performed (using this checkpoint for evaluation would get a worse FiD).

Have you loaded the checkpoint correctly? Since others have succeeded, it may not be a problem with me. https://github.com/tyshiwo1/DiM-DiffusionMamba/issues/8#issuecomment-2182073046

lihao-doc commented 2 weeks ago

CUDA_VISIBLE_DEVICES="0" python ./eval_ldm_discrete.py --config=configs/imagenet256_H_DiM.py --nnet_path='workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet.pth' [2024-06-28 10:25:06,226] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible I0628 10:25:07.460301 130068879759168 eval_ldm_discrete.py:140] Process 0 using device: cuda Counting ImageNet files from assets/datasets/ImageNet Finish counting ImageNet files 1000 classes cnt[:10]: tensor([1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300.]) frac[:10]: [tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010)] prepare the dataset for classifier free guidance with p_uncond=0.1 2024-06-28 10:25:10,717 - _cpp_lib.py - WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.1.1) Python 3.9.19 (you have 3.9.19) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details 2024-06-28 10:25:26,171 - eval_ldm_discrete.py - load nnet from workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet.pth ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:341 in │ │ │ │ 338 │ │ 339 │ │ 340 if name == "main": │ │ ❱ 341 │ app.run(main) │ │ 342 │ │ │ │ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:308 in run │ │ │ │ 305 │ callback = _init_callbacks.popleft() │ │ 306 │ callback() │ │ 307 │ try: │ │ ❱ 308 │ _run_main(main, args) │ │ 309 │ except UsageError as error: │ │ 310 │ usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode) │ │ 311 │ except: │ │ │ │ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:254 in _run_main │ │ │ │ 251 │ atexit.register(profiler.print_stats) │ │ 252 │ sys.exit(profiler.runcall(main, argv)) │ │ 253 else: │ │ ❱ 254 │ sys.exit(main(argv)) │ │ 255 │ │ 256 │ │ 257 def _call_exception_handlers(exception): │ │ │ │ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:337 in main │ │ │ │ 334 │ config = FLAGS.config │ │ 335 │ config.nnet_path = FLAGS.nnet_path │ │ 336 │ config.output_path = FLAGS.output_path │ │ ❱ 337 │ evaluate(config) │ │ 338 │ │ 339 │ │ 340 if name == "main": │ │ │ │ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:156 in evaluate │ │ │ │ 153 │ nnet = accelerator.prepare(nnet) │ │ 154 │ logging.info(f'load nnet from {config.nnet_path}') │ │ 155 │ if (config.nnet_path is not None) and (config.sample.algorithm != 'dpm_solver_upsamp │ │ ❱ 156 │ │ accelerator.unwrap_model(nnet).load_state_dict(torch.load(config.nnetpath, map │ │ 157 │ else: │ │ 158 │ │ accelerator.unwrap_model(nnet) │ │ 159 │ │ │ │ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/torch/nn/modules/module.py:215 │ │ 2 in load_state_dict │ │ │ │ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │ │ 2150 │ │ │ │ 2151 │ │ if len(error_msgs) > 0: │ │ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │ │ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │ │ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │ │ 2155 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]).

Loading the nnet.pth still fails. Are you sure the model you uploaded is correct? I've noticed that the filenames for the 256-resolution and 512-resolution models are identical. The configuration provided by the other individual suggests they might have been using a model they trained themselves. Currently, I need to load the model that you trained.

lihao-doc commented 2 weeks ago

Could you please send me the trained model for 256 resolution?

tyshiwo1 commented 2 weeks ago

OK, I will upload my best 256 model later

lihao-doc commented 2 weeks ago

After successfully sending it, could you please provide me with a link or privately send a copy to my email address HaiLi086@163.com? I am highly interested in your work and would greatly appreciate it!

tyshiwo1 commented 2 weeks ago

Thank you for your appreciation!

I would upload it into this repo, and update this:

ImageNet 256x256 (Huge/2) | 2.21 | 625K | 768 -- | -- | -- | --
lihao-doc commented 2 weeks ago

I previously downloaded it from here: ImageNet 256x256 (Huge/2) 2.40 425K 768

tyshiwo1 commented 2 weeks ago

If you have not prepared your dataset well, you can modify this line of your config to

config.dataset = d(
        name='imagenet256_features',
        path='assets/datasets/imagenet256_features',
        cfg=True,
        p_uncond=0.1
    )

This setting requires NO prepared datasets for evaluation

tyshiwo1 commented 2 weeks ago

I previously downloaded it from here: ImageNet 256x256 (Huge/2) 2.40 425K 768

I know. I would give you a new link.

lihao-doc commented 2 weeks ago

How do I prepare the dataset? I'm unable to properly run the script file scripts/extract_imagenet_feature.py.

python scripts/extract_imagenet_feature.py usage: extract_imagenet_feature.py [-h] path extract_imagenet_feature.py: error: the following arguments are required: path

My dataset path is: /home/lihao/DiM-DiffusionMamba/assets/datasets/ImageNet/train/n01440764/n01440764_18.JPEG. The images in my dataset have been downloaded but not processed further. How come there is an imagenet256_features folder?

tyshiwo1 commented 2 weeks ago

Here is the best 256 model: https://drive.google.com/drive/folders/1ETllUm8Dpd8-vDHefQEXEWF9whdbyhL5?usp=sharing

You can place the new checkpoint to the path ./workdir/imagenet256_H_mambaenc_pad_cross_conv_skip1_2scan_vaeema_ada_4scan/default/ckpts/625000.ckpt/.

Then, execute this ( I just tested it, and it works well ):

accelerate launch --multi_gpu --gpu_ids 0,1 --main_process_port 20039 --num_processes 2 --mixed_precision bf16 ./eval_ldm_discrete.py --config=configs/imagenet256_H_DiM.py --nnet_path='workdir/imagenet256_H_mambaenc_pad_cross_conv_skip1_2scan_vaeema_ada_4scan/default/ckpts/625000.ckpt/nnet_ema_256_625k.pth'
tyshiwo1 commented 2 weeks ago

How do I prepare the dataset? I'm unable to properly run the script file scripts/extract_imagenet_feature.py.

python scripts/extract_imagenet_feature.py usage: extract_imagenet_feature.py [-h] path extract_imagenet_feature.py: error: the following arguments are required: path

My dataset path is: /home/lihao/DiM-DiffusionMamba/assets/datasets/ImageNet/train/n01440764/n01440764_18.JPEG. The images in my dataset have been downloaded but not processed further. How come there is an imagenet256_features folder?

First, I do not use the latent extraction for 256 features in the configs of this open source code. Second, extract_imagenet_feature.py: error: the following arguments are required: path means you need to type a path like python scripts/extract_imagenet_feature.py /home/lihao/DiM-DiffusionMamba/assets/datasets/ImageNet

lihao-doc commented 2 weeks ago

Thank you for your meticulous guidance; I have resolved all of my issues.

tyshiwo1 commented 2 weeks ago

Thank you for your meticulous guidance; I have resolved all of my issues.

OK, I will close the issue.