Trying to train on an existing model

Hi! This is a really great tool and it's been fun using it. I am trying to train the model 'bert-base-multilingual-uncased' using a tokenized dataset in the correct format. But every time I run the script, it loads the file and weights and promptly stops as weights of the pre trained model aren't initialised. This is the message I get:

07/21/2022 13:48:50 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False 07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /Users/devi/.cache/torch/awesome-align/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0 07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": null, "directionality": "bidi", "do_sample": false, "eos_token_ids": null, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 3072, "is_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-12, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_beams": 1, "num_hidden_layers": 12, "num_labels": 2, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "repetition_penalty": 1.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "torchscript": false, "type_vocab_size": 2, "use_bfloat16": false, "vocab_size": 119547 }

07/21/2022 13:48:51 - INFO - awesome_align.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /Users/devi/.cache/torch/awesome-align/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729 07/21/2022 13:48:52 - INFO - awesome_align.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-pytorch_model.bin from cache at /Users/devi/.cache/torch/awesome-align/5b5b80054cd2c95a946a8e0ce0b93f56326dff9fbda6a6c3e02de3c91c918342.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059 07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights of BertForMaskedLM not initialized from pretrained model: ['cls.predictions.decoder.bias', 'psi_cls.bias', 'psi_cls.transform.weight', 'psi_cls.transform.bias', 'psi_cls.decoder.weight', 'psi_cls.decoder.bias'] 07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights from pretrained model not used in BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] 07/21/2022 13:48:56 - INFO - main - Training/evaluation parameters Namespace(train_data_file=‘de-en_tmx_align.txt', output_dir='align/train_model', train_mlm=False, train_tlm=False, train_tlm_full=False, train_so=False, train_psi=False, train_co=False, train_gold_file=None, eval_gold_file=None, ignore_possible_alignments=False, gold_one_index=False, cache_data=False, align_layer=8, extraction='softmax', softmax_threshold=0.001, eval_data_file='examples/deen_param_test', should_continue=False, model_name_or_path='bert-base-multilingual-cased', mlm_probability=0.15, config_name=None, tokenizer_name=None, cache_dir=None, block_size=512, do_train=False, do_eval=False, per_gpu_train_batch_size=2, per_gpu_eval_batch_size=2, gradient_accumulation_steps=4, learning_rate=2e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, overwrite_output_dir=False, overwrite_cache=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, n_gpu=0, device=device(type='cpu'))

Please help with a solution or if I'm doing something wrong! Thanks

neulab / awesome-align

Trying to train on an existing model #47