neeek2303 / EMOPortraits

Official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
306 stars 17 forks source link

Thank you for your open-source work. Due to GPU resource limitations, I am retraining the model at 256 resolution. Have you trained at 256 before? If so, could you share your configuration? #21

Open xiaoxiaolai opened 3 weeks ago

johndpope commented 1 week ago

found these - seems its trained this way out of the box which is surprising - given the megaportrait paper detailed 512 ... and the quality of the outputs of EMO is so high....

        parser.add_argument('--idt_image_size', default=256, type=int)
        parser.add_argument('--exp_image_size', default=256, type=int)

also in trainer there is


        parser.add_argument('--image_size', default=256, type=int)
        parser.add_argument('--image_additional_size', default=None, type=int)
        parser.add_argument('--image_additional_size_d', default=None, type=int)
        parser.add_argument('--aug_warp_size', default=256, type=int)

UPDATE - in the experiment folder there's this - that does use image_size 512 python3 -m torch.distributed.launch --master_port 15588 --nproc_per_node=8 ../train.py --experiment_name Retrain_with_17_V1_New_rand_MM_SEC_4_drop_02_stm_10_CV_05_1_1 --dataset_name voxceleb2hq_pairs --dataset_name_test voxceleb2hq_pairs --num_gpus 8 --batch_size 2 --max_epochs 400 --image_size 512 --aug_warp_size 512 --vgg19_num_scales 4 --dis_num_scales 2 --gen_shd_max_iters 400000 --dis_shd_max_iters 400000 --logging_freq 10 --visuals_freq 200 --vgg19_weight 18 --gaze_weight 10 --vgg19_face 10 --perc_face_pars 0 --face_resnet 0 --feature_matching_weight 40 --resnet18_fv_mix 35 --vgg19_fv_mix 0.0 --norm_layer_type gn --use_seg True --pull_exp 1 --push_exp 1 --stm 10 --contrastive_exp 2 --contrastive_idt 0.0 --test_batch_size 4 --train_epoch_len 15000 --test_epoch_len 2000 --dis_num_blocks 4 --gen_opt_type adamw --dis_opt_type adamw --dis_beta1 0.5 --gen_beta1 0.5 --lpe_face_backbone resnet18 --dec_pred_seg False --use_back False --use_stylegan_d False --dis_stylegan_lr 0.0002 --use_ws True --separate_idt False --r1 2.0 --mix_losses_start 1 --contr_losses_start 1 --stylegan_weight 1.0 --use_masked_aug False --num_b_negs 1 --dec_max_channels 512 --dec_channel_mult 2 --enc_channel_mult 4 --gen_dummy_input_size 8 --gen_latent_texture_channels 96 --latent_volume_channels 96 --source_volume_num_blocks 3 --custom_test True --augment_geometric_train False --random_theta True --green True --old_mix_pose False --use_mix_mask True --w_eyes_loss_l1 500 --w_mouth_loss_l1 500 --w_ears_loss_l1 500 --normalize_losses True --use_tensor False --use_amp False --lpe_output_channels_expression 128 --use_ibug_mask False --checkpoint_freq 10 --print_norms True --print_model False --im_dec_num_lrs_per_resolution 2 --im_dec_ch_div_factor 1.5 --dec_num_blocks 6 --dec_use_adanorm False --emb_v_exp False --save_exp_vectors True --dec_no_detach_frec 1 --sec_dataset_every 4 --predict_target_canon_vol True --volumes_l1 0.5 --vol_loss_epoch 1 --vol_loss_grad 1 --dec_key_emb orig_d --detach_lat_vol -1 --aug_color_coef 10 --exp_dropout 0.2 --separate_stm True --bs_resnet18_fv_mix 2 --use_sec_dataset True