Closed nicolabertoldi closed 5 years ago
I suspect there is a wrong training hyper-parameter for your lm training.
@teslacool
Do you have any idea whether my setting (see below) is somehow wrong?
Namespace(adaptive_input=False, adaptive_input_cutoff=None, adaptive_input_factor=4, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, adaptive_softmax_factor=4, arch='transformer_lm', attention_dropout=0.0, bucket_cap_mb=25, char_embedder_highway_layers=2, character_embedding_dim=4, character_embeddings=False, character_filters='[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]', clip_norm=25, cpu=False, criterion='cross_entropy', curriculum=0, data='/data/workspace/SoftContextualDataAugmentation/experiments/data_generated_sl', ddp_backend='c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_ffn_embed_dim=2048, decoder_input_dim=512, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, decoder_output_dim=512, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, fix_batches_to_gpus=False, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, future_target=False, keep_interval_updates=-1, keep_last_epochs=-1, lazy_load=False, log_format=None, log_interval=1000, lr=[0.25], lr_scheduler='reduce_lr_on_plateau', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_tokens=6000, max_update=0, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=1e-05, momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, no_token_positional_embeddings=False, num_workers=0, optimizer='nag', optimizer_overrides='{}', output_dictionary_size=-1, past_target=False, raw_text=False, relu_dropout=0.0, required_batch_size_multiple=8, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode=None, save_dir='/data/workspace/SoftContextualDataAugmentation/experiments/lm_sl', save_interval=1, save_interval_updates=0, seed=1, self_target=False, sentence_avg=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, task='language_modeling', tensorboard_logdir='', threshold_loss_scale=None, tie_adaptive_proj=False, tie_adaptive_weights=False, tokens_per_sample=1024, train_subset='train', update_freq=[1], user_dir=None, valid_subset='valid', validate_interval=1, weight_decay=0.0)
| dictionary: 32456 types
| /data/workspace/SoftContextualDataAugmentation/experiments/data_generated_sl train 4567 examples
| /data/workspace/SoftContextualDataAugmentation/experiments/data_generated_sl valid 47 examples
before the last part of the log of one of my training of the language model
Why the ppl reports a inf value?