when I do as examples showing:
python3 main.py --O --image_path $DATA_DIR/rgba.png --learned_embeds_path $DATA_DIR/learned_embeds.bin --text "A high-resolution DSLR image of a $TOKEN" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5"
it always comes out:
'lr': 0.001,
'lr_warmup': False,
'max_ray_batch': 4096,
'max_steps': 512,
'min_lr': 1e-06,
'min_near': 0.1,
'negative': '',
'noise_real_camera': 0.001,
'noise_real_camera_annealing': True,
'num_rays': 4096,
'num_steps': 64,
'optim': 'adamw',
'pose_angle': 75,
'pretrained_model_image_size': 512,
'pretrained_model_name_or_path': 'runwayml/stable-diffusion-v1-5',
'radius_range': (1.0, 1.5),
'radius_rot': 1.8,
'real_every': 1,
'real_iters': 0,
'replace_synthetic_camera_every': 10,
'replace_synthetic_camera_noise': 0.02,
'run_name': 'default',
'save_mesh': False,
'save_test_name': 'df_test',
'seed': 101,
'suppress_face': None,
'test': False,
'test_on_real_data': False,
'text': 'A high-resolution DSLR image of a _cake2',
'uniform_sphere_rate': 0.5,
'update_extra_interval': 16,
'upsample_steps': 32,
'wandb': False,
'warm_iters': 2000,
'workspace': 'outputs/default/2023-05-16--12-57-00--seed-101'}
Grid encoder level 0 has resolution 16 and params 4920
Grid encoder level 1 has resolution 22 and params 12168
Grid encoder level 2 has resolution 30 and params 29792
Grid encoder level 3 has resolution 40 and params 65536
Grid encoder level 4 has resolution 55 and params 65536
Grid encoder level 5 has resolution 74 and params 65536
Grid encoder level 6 has resolution 100 and params 65536
Grid encoder level 7 has resolution 135 and params 65536
Grid encoder level 8 has resolution 183 and params 65536
Grid encoder level 9 has resolution 248 and params 65536
Grid encoder level 10 has resolution 336 and params 65536
Grid encoder level 11 has resolution 455 and params 65536
Grid encoder level 12 has resolution 617 and params 65536
Grid encoder level 13 has resolution 836 and params 65536
Grid encoder level 14 has resolution 1134 and params 65536
Grid encoder level 15 has resolution 1536 and params 65536
NeRFNetwork(
(encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 1536 per_level_scale=1.3557 params=(898848, 2) gridtype=tiled align_corners=False interpolation=linear
(sigma_net): MLP(
(net): ModuleList(
(0): Linear(in_features=32, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
(2): Linear(in_features=64, out_features=4, bias=True)
)
)
(encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
(bg_net): MLP(
(net): ModuleList(
(0): Linear(in_features=39, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=3, bias=True)
)
)
)
/home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute unet directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'unet' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.unet'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
/home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute text_encoder directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'text_encoder' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.text_encoder'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/hhn/realfusion/main.py:164 in │
│ │
│ 161 │
│ 162 │
│ 163 if name == 'main': │
│ ❱ 164 │ main() │
│ 165 │
│ │
│ /home/hhn/realfusion/main.py:103 in main │
│ │
│ 100 │ │ stable_diffusion_model = StableDiffusionModel.from_pretrained(opt.pretrained_mod │
│ 101 │ │ # import pdb;pdb.set_trace() │
│ 102 │ │ if opt.learned_embeds_path is not None: # add textual inversion tokens to model │
│ ❱ 103 │ │ │ add_tokens_to_model_from_path( │
│ 104 │ │ │ │ opt.learned_embeds_path, stable_diffusion_model.text_encoder, stable_dif │
│ 105 │ │ │ ) │
│ 106 │ │ guidance = StableDiffusion(stable_diffusion_model=stable_diffusion_model, device │
│ │
│ /home/hhn/realfusion/sd/utils.py:40 in add_tokens_to_model_from_path │
│ │
│ 37 │ │ tokenizer: CLIPTokenizer, override_token: Optional[Union[str, dict]] = None) -> │
│ 38 │ r"""Loads tokens from a file and adds them to the tokenizer and text encoder of a mo │
│ 39 │ learned_embeds: Mapping[str, Tensor] = torch.load(learned_embeds_path, map_location= │
│ ❱ 40 │ add_tokens_to_model(learned_embeds, text_encoder, tokenizer, override_token) │
│ 41 │
│ │
│ /home/hhn/realfusion/sd/utils.py:15 in add_tokens_to_model │
│ │
│ 12 │ # Loop over learned embeddings │
│ 13 │ new_tokens = [] │
│ 14 │ for token, embedding in learned_embeds.items(): │
│ ❱ 15 │ │ embedding = embedding.to(text_encoder.get_input_embeddings().weight.dtype) │
│ 16 │ │ if override_token is not None: │
│ 17 │ │ │ token = override_token if isinstance(override_token, str) else override_toke │
│ 18 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'tuple' object has no attribute 'get_input_embeddings'
Steps to Reproduce
As examples show:
the command is "export TOKEN="_cake2" # set this according to your textual inversion placeholder_token or use the trick below
export DATA_DIR=$PWD/examples/natural-images/cake_2
python main.py --O \
--image_path $DATA_DIR/rgba.png \
--learned_embeds_path $DATA_DIR/learned_embeds.bin \
--text "A high-resolution DSLR image of a $TOKEN" \
--pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5""
Description
when I do as examples showing: python3 main.py --O --image_path $DATA_DIR/rgba.png --learned_embeds_path $DATA_DIR/learned_embeds.bin --text "A high-resolution DSLR image of a $TOKEN" --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5" it always comes out: 'lr': 0.001, 'lr_warmup': False, 'max_ray_batch': 4096, 'max_steps': 512, 'min_lr': 1e-06, 'min_near': 0.1, 'negative': '', 'noise_real_camera': 0.001, 'noise_real_camera_annealing': True, 'num_rays': 4096, 'num_steps': 64, 'optim': 'adamw', 'pose_angle': 75, 'pretrained_model_image_size': 512, 'pretrained_model_name_or_path': 'runwayml/stable-diffusion-v1-5', 'radius_range': (1.0, 1.5), 'radius_rot': 1.8, 'real_every': 1, 'real_iters': 0, 'replace_synthetic_camera_every': 10, 'replace_synthetic_camera_noise': 0.02, 'run_name': 'default', 'save_mesh': False, 'save_test_name': 'df_test', 'seed': 101, 'suppress_face': None, 'test': False, 'test_on_real_data': False, 'text': 'A high-resolution DSLR image of a _cake2', 'uniform_sphere_rate': 0.5, 'update_extra_interval': 16, 'upsample_steps': 32, 'wandb': False, 'warm_iters': 2000, 'workspace': 'outputs/default/2023-05-16--12-57-00--seed-101'} Grid encoder level 0 has resolution 16 and params 4920 Grid encoder level 1 has resolution 22 and params 12168 Grid encoder level 2 has resolution 30 and params 29792 Grid encoder level 3 has resolution 40 and params 65536 Grid encoder level 4 has resolution 55 and params 65536 Grid encoder level 5 has resolution 74 and params 65536 Grid encoder level 6 has resolution 100 and params 65536 Grid encoder level 7 has resolution 135 and params 65536 Grid encoder level 8 has resolution 183 and params 65536 Grid encoder level 9 has resolution 248 and params 65536 Grid encoder level 10 has resolution 336 and params 65536 Grid encoder level 11 has resolution 455 and params 65536 Grid encoder level 12 has resolution 617 and params 65536 Grid encoder level 13 has resolution 836 and params 65536 Grid encoder level 14 has resolution 1134 and params 65536 Grid encoder level 15 has resolution 1536 and params 65536 NeRFNetwork( (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 1536 per_level_scale=1.3557 params=(898848, 2) gridtype=tiled align_corners=False interpolation=linear (sigma_net): MLP( (net): ModuleList( (0): Linear(in_features=32, out_features=64, bias=True) (1): Linear(in_features=64, out_features=64, bias=True) (2): Linear(in_features=64, out_features=4, bias=True) ) ) (encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39 (bg_net): MLP( (net): ModuleList( (0): Linear(in_features=39, out_features=64, bias=True) (1): Linear(in_features=64, out_features=3, bias=True) ) ) ) /home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attribute │
│ │
│ 161 │
│ 162 │
│ 163 if name == 'main': │
│ ❱ 164 │ main() │
│ 165 │
│ │
│ /home/hhn/realfusion/main.py:103 in main │
│ │
│ 100 │ │ stable_diffusion_model = StableDiffusionModel.from_pretrained(opt.pretrained_mod │
│ 101 │ │ # import pdb;pdb.set_trace() │
│ 102 │ │ if opt.learned_embeds_path is not None: # add textual inversion tokens to model │
│ ❱ 103 │ │ │ add_tokens_to_model_from_path( │
│ 104 │ │ │ │ opt.learned_embeds_path, stable_diffusion_model.text_encoder, stable_dif │
│ 105 │ │ │ ) │
│ 106 │ │ guidance = StableDiffusion(stable_diffusion_model=stable_diffusion_model, device │
│ │
│ /home/hhn/realfusion/sd/utils.py:40 in add_tokens_to_model_from_path │
│ │
│ 37 │ │ tokenizer: CLIPTokenizer, override_token: Optional[Union[str, dict]] = None) -> │
│ 38 │ r"""Loads tokens from a file and adds them to the tokenizer and text encoder of a mo │
│ 39 │ learned_embeds: Mapping[str, Tensor] = torch.load(learned_embeds_path, map_location= │
│ ❱ 40 │ add_tokens_to_model(learned_embeds, text_encoder, tokenizer, override_token) │
│ 41 │
│ │
│ /home/hhn/realfusion/sd/utils.py:15 in add_tokens_to_model │
│ │
│ 12 │ # Loop over learned embeddings │
│ 13 │ new_tokens = [] │
│ 14 │ for token, embedding in learned_embeds.items(): │
│ ❱ 15 │ │ embedding = embedding.to(text_encoder.get_input_embeddings().weight.dtype) │
│ 16 │ │ if override_token is not None: │
│ 17 │ │ │ token = override_token if isinstance(override_token, str) else override_toke │
│ 18 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'tuple' object has no attribute 'get_input_embeddings'
unet
directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'unet' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.unet'. deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False) /home/hhn/.local/lib/python3.8/site-packages/diffusers/configuration_utils.py:135: FutureWarning: Accessing config attributetext_encoder
directly via 'StableDiffusionModel' object attribute is deprecated. Please access 'text_encoder' over 'StableDiffusionModel's config object instead, e.g. 'scheduler.config.text_encoder'. deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False) ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/hhn/realfusion/main.py:164 inSteps to Reproduce
As examples show: the command is "export TOKEN="_cake2" # set this according to your textual inversion placeholder_token or use the trick below export DATA_DIR=$PWD/examples/natural-images/cake_2
python main.py --O \ --image_path $DATA_DIR/rgba.png \ --learned_embeds_path $DATA_DIR/learned_embeds.bin \ --text "A high-resolution DSLR image of a $TOKEN" \ --pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5""
Expected Behavior
Maybe I miss some key operation?
Environment
Ubuntu18.04, torch 2.0.0, CUDA 12.0