Closed ihish52 closed 3 years ago
Hi ihish52,
Thanks for your question! Could you provide more details about your command and which line caught the error?
Thanks for the quick reply. Attached is the config file I am using to perform the evolutionary search for my i5 CPU (barely any change from your example). There are no NVIDIA drivers installed so I do not think the GPU is affecting this.
Below is the output when I run the command for evo_search.py:
python3 evo_search.py --configs=configs/wmt14.en-de/supertransformer/space0.yml --evo-configs=configs/wmt14.en-de/evo_search/wmt14ende_i5.yml
Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, arch='transformersuper_wmt_en_de', attention_dropout=0.1, beam=5, best_checkpoint_metric='loss', bucket_cap_mb=25, ckpt_path='./latency_dataset/predictors/wmt14ende_cpu_i5.pt', clip_norm=0.0, configs='configs/wmt14.en-de/supertransformer/space0.yml', cpu=False, criterion='label_smoothed_cross_entropy', crossover_size=50, curriculum=0, data='data/binary/wmt16_en_de', dataset_impl=None, ddp_backend='no_c10d', decoder_arbitrary_ende_attn_all_subtransformer=None, decoder_arbitrary_ende_attn_choice=[-1, 1, 2], decoder_attention_heads=8, decoder_embed_choice=[640, 512], decoder_embed_dim=640, decoder_embed_dim_subtransformer=None, decoder_embed_path=None, decoder_ende_attention_heads_all_subtransformer=None, decoder_ende_attention_heads_choice=[8, 4], decoder_ffn_embed_dim=3072, decoder_ffn_embed_dim_all_subtransformer=None, decoder_ffn_embed_dim_choice=[3072, 2048, 1024], decoder_input_dim=640, decoder_layer_num_choice=[6, 5, 4, 3, 2, 1], decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=640, decoder_self_attention_heads_all_subtransformer=None, decoder_self_attention_heads_choice=[8, 4], device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, diverse_beam_groups=-1, diverse_beam_strength=0.5, dropout=0.3, encoder_attention_heads=8, encoder_embed_choice=[640, 512], encoder_embed_dim=640, encoder_embed_dim_subtransformer=None, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_ffn_embed_dim_all_subtransformer=None, encoder_ffn_embed_dim_choice=[3072, 2048, 1024], encoder_layer_num_choice=[6], encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, encoder_self_attention_heads_all_subtransformer=None, encoder_self_attention_heads_choice=[8, 4], evo_configs='configs/wmt14.en-de/evo_search/wmt14ende_i5.yml', evo_iter=30, feature_norm=[640.0, 6.0, 2048.0, 6.0, 640.0, 6.0, 2048.0, 6.0, 6.0, 2.0], find_unused_parameters=False, fix_batches_to_gpus=False, fp16=True, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, get_attn=False, keep_interval_updates=-1, keep_last_epochs=20, label_smoothing=0.1, lat_norm=700.0, latency_constraint=6000.0, lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr=[1e-07], lr_period_updates=-1, lr_scheduler='cosine', lr_shrink=1.0, match_source_len=False, max_epoch=0, max_len_a=0, max_len_b=200, max_lr=0.001, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=4096, max_tokens_valid=4096, max_update=40000, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, min_lr=-1, model_overrides='{}', mutation_prob=0.3, mutation_size=50, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_repeat_ngram_size=0, no_save=False, no_save_optimizer_state=False, no_token_positional_embeddings=False, num_workers=10, optimizer='adam', optimizer_overrides='{}', parent_size=25, path=None, pdb=False, population_size=125, prefix_size=0, print_alignment=False, profile_latency=False, qkv_dim=512, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='./downloaded_models/HAT_wmt14ende_super_space0.pt', results_path=None, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, save_dir='checkpoints/wmt14.en-de/supertransformer/space0', save_interval=10, save_interval_updates=0, score_reference=False, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, t_mult=1, target_lang=None, task='translation', tbmf_wrapper=False, temperature=1.0, tensorboard_logdir='checkpoints/wmt14.en-de/supertransformer/space0/tensorboard', threshold_loss_scale=None, train_subset='train', unkpen=0, unnormalized=False, update_freq=[16], upsample_primary=1, use_bmuf=False, user_dir=None, valid_cnt_max=1000000000.0, valid_subset='valid', validate_interval=10, vocab_original_scaling=False, warmup_init_lr=1e-07, warmup_updates=10000, weight_decay=0.0, write_config_path='configs/wmt14.en-de/subtransformer/wmt14ende_i5.yml')
| [en] dictionary: 32768 types
| [de] dictionary: 32768 types
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.en
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.de
| data/binary/wmt16_en_de valid en-de 3000 examples
| Fallback to xavier initializer
TransformerSuperModel(
(encoder): TransformerEncoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
(decoder): TransformerDecoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
)
| loaded checkpoint ./downloaded_models/HAT_wmt14ende_super_space0.pt (epoch 136 @ 0 updates)
| loading train data for epoch 136
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.en
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.de
| data/binary/wmt16_en_de valid en-de 3000 examples
| Start Iteration 0:
Traceback (most recent call last):
File "evo_search.py", line 106, in
Still not sure what caused this. Made a new environment and installed all dependencies again, with versions exactly as stated in the requirements and it worked. Closing this issue. Thanks for the response.
Hi ihish52,
Sorry for my late reply, I was too busy in the past several weeks. The reason for the error is that some methods on float16 are not supported by PyTorch CPU, so I fixed by using fp32 when performing the evolutionary search on CPU. (commit)
Thanks for your contribution!
Best, Hanrui
Hi Hishan,
I will close the issue for now. Feel free to reopen if you have any further questions!
Best, Hanrui
Hi,
When following the code step by step, I get an error when running the evolutionary search. The error is: "_th_admm_out not supported on CPUType for Half"
Do you know what could be causing this and how to fix it? I am currently running this for my i5 CPU. Does the config file need any change to avoid using the GPU when only the CPU is being tested?
Help with this would be highly appreciated. Thanks.