Open rahulmool opened 1 year ago
Can you check whether all_ict_samples-trans100.jsonl
contains any data, or is it empty?
Yes It does contains data
Can you check if this data
variable is not an empty list? https://github.com/khuangaf/CONCRETE/blob/master/CORA/mDPR/run_xict.py#L94
yes for validation the data variable is empty.
this is the exact output
init using bert-base-multilingual-uncased loading weights file https://cdn.huggingface.co/bert-base-multilingual-uncased-pytorch_model.bin from cache at /home/22cs60r72/.cache/torch/transformers/b72dd13aa8437628227c4928499f2476a84af4ab7ed75eb1365c5ae9fdbd7638.54b4dad9cc3db9aa8448458b782d11ab07c80dedb951906fd2f684a00ecdb1ee All model checkpoint weights were used when initializing HFBertEncoder.
All the weights of HFBertEncoder were initialized from the model checkpoint at bert-base-multilingual-uncased. If your task is similar to the task the model of the ckeckpoint was trained on, you can already use HFBertEncoder for predictions without further training. loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-config.json from cache at /home/22cs60r72/.cache/torch/transformers/33b56ce0f312e47e4d77a57791a4fc6233ae4a560dd2bdd186107058294e58ab.fcb1786f49c279f0e0f158c9972b9bd9f6c0edb5d893dcb9b530d714d86f0edc Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 105879 }
init using bert-base-multilingual-uncased loading weights file https://cdn.huggingface.co/bert-base-multilingual-uncased-pytorch_model.bin from cache at /home/22cs60r72/.cache/torch/transformers/b72dd13aa8437628227c4928499f2476a84af4ab7ed75eb1365c5ae9fdbd7638.54b4dad9cc3db9aa8448458b782d11ab07c80dedb951906fd2f684a00ecdb1ee All model checkpoint weights were used when initializing HFBertEncoder.
All the weights of HFBertEncoder were initialized from the model checkpoint at bert-base-multilingual-uncased.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use HFBertEncoder for predictions without further training.
loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt from cache at /home/22cs60r72/.cache/torch/transformers/bb773818882b0524dc53a1b31a2cc95bc489f000e7e19773ba07846011a6c711.535306b226c42cebebbc0dabc83b92ab11260e9919e21e2ab0beb301f267b4c7
Reading file ../../data/bbc_passages/all_ict_samples.jsonl_1
Aggregated data size: 12250
Reading file ../../data/bbc_passages/all_ict_samples.jsonl_2
Aggregated data size: 24500
Reading file ../../data/bbc_passages/all_ict_samples.jsonl_0
Aggregated data size: 36750
Total cleaned data size: 36750
Total iterations per epoch=9188
Total updates=367520
Eval step = 9188
Training
Epoch 0
Epoch: 0: Step: 1/9188, loss=7.311617, lr=0.000000
Train batch 100
Avg. loss per last 100 batches: 5.303617
Epoch: 0: Step: 101/9188, loss=0.816039, lr=0.000007
Train batch 200
Avg. loss per last 100 batches: 1.082258
Epoch: 0: Step: 201/9188, loss=1.185385, lr=0.000013
Train batch 300
Avg. loss per last 100 batches: 1.143644
Epoch: 0: Step: 301/9188, loss=2.163596, lr=0.000020
Train batch 400
Avg. loss per last 100 batches: 1.020407
Epoch: 0: Step: 401/9188, loss=0.395899, lr=0.000020
Train batch 500
Avg. loss per last 100 batches: 0.950857
Epoch: 0: Step: 501/9188, loss=1.575688, lr=0.000020
Train batch 600
Avg. loss per last 100 batches: 0.833716
Epoch: 0: Step: 601/9188, loss=0.066809, lr=0.000020
Train batch 700
Avg. loss per last 100 batches: 0.844092
Epoch: 0: Step: 701/9188, loss=2.120656, lr=0.000020
Train batch 800
Avg. loss per last 100 batches: 0.758489
Epoch: 0: Step: 801/9188, loss=0.041512, lr=0.000020
Train batch 900
Avg. loss per last 100 batches: 0.760439
Epoch: 0: Step: 901/9188, loss=0.701504, lr=0.000020
Train batch 1000
Avg. loss per last 100 batches: 0.725864
Epoch: 0: Step: 1001/9188, loss=0.466521, lr=0.000020
Train batch 1100
Avg. loss per last 100 batches: 0.820713
Epoch: 0: Step: 1101/9188, loss=1.519754, lr=0.000020
Train batch 1200
Avg. loss per last 100 batches: 0.722633
Epoch: 0: Step: 1201/9188, loss=0.267421, lr=0.000020
Train batch 1300
Avg. loss per last 100 batches: 0.803082
Epoch: 0: Step: 1301/9188, loss=0.025564, lr=0.000020
Train batch 1400
Avg. loss per last 100 batches: 0.688916
Epoch: 0: Step: 1401/9188, loss=0.268893, lr=0.000020
Train batch 1500
Avg. loss per last 100 batches: 0.694311
Epoch: 0: Step: 1501/9188, loss=0.423493, lr=0.000020
Train batch 1600
Avg. loss per last 100 batches: 0.733490
Epoch: 0: Step: 1601/9188, loss=0.326050, lr=0.000020
Train batch 1700
Avg. loss per last 100 batches: 0.777383
Epoch: 0: Step: 1701/9188, loss=1.815521, lr=0.000020
Train batch 1800
Avg. loss per last 100 batches: 0.654307
Epoch: 0: Step: 1801/9188, loss=0.704480, lr=0.000020
Train batch 1900
Avg. loss per last 100 batches: 0.791680
Epoch: 0: Step: 1901/9188, loss=0.410724, lr=0.000020
Train batch 2000
Avg. loss per last 100 batches: 0.658655
Epoch: 0: Step: 2001/9188, loss=0.005747, lr=0.000020
Train batch 2100
Avg. loss per last 100 batches: 0.762728
Epoch: 0: Step: 2101/9188, loss=0.768077, lr=0.000020
Train batch 2200
Avg. loss per last 100 batches: 0.724533
Epoch: 0: Step: 2201/9188, loss=0.725896, lr=0.000020
Train batch 2300
Avg. loss per last 100 batches: 0.682972
Epoch: 0: Step: 2301/9188, loss=1.073155, lr=0.000020
Train batch 2400
Avg. loss per last 100 batches: 0.648425
Epoch: 0: Step: 2401/9188, loss=0.473070, lr=0.000020
Train batch 2500
Avg. loss per last 100 batches: 0.625523
Epoch: 0: Step: 2501/9188, loss=0.043014, lr=0.000020
Train batch 2600
Avg. loss per last 100 batches: 0.701965
Epoch: 0: Step: 2601/9188, loss=0.006406, lr=0.000020
Train batch 2700
Avg. loss per last 100 batches: 0.710023
Epoch: 0: Step: 2701/9188, loss=1.481423, lr=0.000020
Train batch 2800
Avg. loss per last 100 batches: 0.562529
Epoch: 0: Step: 2801/9188, loss=0.711672, lr=0.000020
Train batch 2900
Avg. loss per last 100 batches: 0.823689
Epoch: 0: Step: 2901/9188, loss=1.403012, lr=0.000020
Train batch 3000
Avg. loss per last 100 batches: 0.713877
Epoch: 0: Step: 3001/9188, loss=1.028094, lr=0.000020
Train batch 3100
Avg. loss per last 100 batches: 0.655354
Epoch: 0: Step: 3101/9188, loss=0.650727, lr=0.000020
Train batch 3200
Avg. loss per last 100 batches: 0.707570
Epoch: 0: Step: 3201/9188, loss=0.115641, lr=0.000020
Train batch 3300
Avg. loss per last 100 batches: 0.521763
Epoch: 0: Step: 3301/9188, loss=0.057539, lr=0.000020
Train batch 3400
Avg. loss per last 100 batches: 0.611837
Epoch: 0: Step: 3401/9188, loss=0.220680, lr=0.000020
Train batch 3500
Avg. loss per last 100 batches: 0.687215
Epoch: 0: Step: 3501/9188, loss=0.117760, lr=0.000020
Train batch 3600
Avg. loss per last 100 batches: 0.612891
Epoch: 0: Step: 3601/9188, loss=1.465150, lr=0.000020
Train batch 3700
Avg. loss per last 100 batches: 0.850417
Epoch: 0: Step: 3701/9188, loss=0.035678, lr=0.000020
Train batch 3800
Avg. loss per last 100 batches: 0.789871
Epoch: 0: Step: 3801/9188, loss=0.646053, lr=0.000020
Train batch 3900
Avg. loss per last 100 batches: 0.752498
Epoch: 0: Step: 3901/9188, loss=0.282335, lr=0.000020
Train batch 4000
Avg. loss per last 100 batches: 0.567328
Epoch: 0: Step: 4001/9188, loss=0.012028, lr=0.000020
Train batch 4100
Avg. loss per last 100 batches: 0.548741
Epoch: 0: Step: 4101/9188, loss=1.539706, lr=0.000020
Train batch 4200
Avg. loss per last 100 batches: 0.734413
Epoch: 0: Step: 4201/9188, loss=0.000198, lr=0.000020
Train batch 4300
Avg. loss per last 100 batches: 0.548030
Epoch: 0: Step: 4301/9188, loss=2.856502, lr=0.000020
Train batch 4400
Avg. loss per last 100 batches: 0.707106
Epoch: 0: Step: 4401/9188, loss=0.536959, lr=0.000020
Train batch 4500
Avg. loss per last 100 batches: 0.601878
Epoch: 0: Step: 4501/9188, loss=0.015160, lr=0.000020
Train batch 4600
Avg. loss per last 100 batches: 0.766000
Epoch: 0: Step: 4601/9188, loss=0.040841, lr=0.000020
Train batch 4700
Avg. loss per last 100 batches: 0.767216
Epoch: 0: Step: 4701/9188, loss=1.521197, lr=0.000020
Train batch 4800
Avg. loss per last 100 batches: 0.615036
Epoch: 0: Step: 4801/9188, loss=1.145796, lr=0.000020
Train batch 4900
Avg. loss per last 100 batches: 0.671538
Epoch: 0: Step: 4901/9188, loss=2.099149, lr=0.000020
Train batch 5000
Avg. loss per last 100 batches: 0.632023
Epoch: 0: Step: 5001/9188, loss=0.254401, lr=0.000020
Train batch 5100
Avg. loss per last 100 batches: 0.654933
Epoch: 0: Step: 5101/9188, loss=0.479718, lr=0.000020
Train batch 5200
Avg. loss per last 100 batches: 0.542308
Epoch: 0: Step: 5201/9188, loss=0.670710, lr=0.000020
Train batch 5300
Avg. loss per last 100 batches: 0.565748
Epoch: 0: Step: 5301/9188, loss=0.003618, lr=0.000020
Train batch 5400
Avg. loss per last 100 batches: 0.620327
Epoch: 0: Step: 5401/9188, loss=1.348403, lr=0.000020
Train batch 5500
Avg. loss per last 100 batches: 0.600770
Epoch: 0: Step: 5501/9188, loss=0.152233, lr=0.000020
Train batch 5600
Avg. loss per last 100 batches: 0.494991
Epoch: 0: Step: 5601/9188, loss=0.047969, lr=0.000020
Train batch 5700
Avg. loss per last 100 batches: 0.647839
Epoch: 0: Step: 5701/9188, loss=0.137598, lr=0.000020
Train batch 5800
Avg. loss per last 100 batches: 0.566338
Epoch: 0: Step: 5801/9188, loss=0.314973, lr=0.000020
Train batch 5900
Avg. loss per last 100 batches: 0.639788
Epoch: 0: Step: 5901/9188, loss=0.009664, lr=0.000020
Train batch 6000
Avg. loss per last 100 batches: 0.509846
Epoch: 0: Step: 6001/9188, loss=0.239767, lr=0.000020
Train batch 6100
Avg. loss per last 100 batches: 0.629419
Epoch: 0: Step: 6101/9188, loss=0.000416, lr=0.000020
Train batch 6200
Avg. loss per last 100 batches: 0.539567
Epoch: 0: Step: 6201/9188, loss=0.014686, lr=0.000020
Train batch 6300
Avg. loss per last 100 batches: 0.730408
Epoch: 0: Step: 6301/9188, loss=0.434280, lr=0.000020
Train batch 6400
Avg. loss per last 100 batches: 0.466699
Epoch: 0: Step: 6401/9188, loss=0.750414, lr=0.000020
Train batch 6500
Avg. loss per last 100 batches: 0.699479
Epoch: 0: Step: 6501/9188, loss=1.802849, lr=0.000020
Train batch 6600
Avg. loss per last 100 batches: 0.700786
Epoch: 0: Step: 6601/9188, loss=0.286703, lr=0.000020
Train batch 6700
Avg. loss per last 100 batches: 0.851789
Epoch: 0: Step: 6701/9188, loss=0.336051, lr=0.000020
Train batch 6800
Avg. loss per last 100 batches: 0.531913
Epoch: 0: Step: 6801/9188, loss=0.021930, lr=0.000020
Train batch 6900
Avg. loss per last 100 batches: 0.556301
Epoch: 0: Step: 6901/9188, loss=0.018971, lr=0.000020
Train batch 7000
Avg. loss per last 100 batches: 0.567307
Epoch: 0: Step: 7001/9188, loss=2.772715, lr=0.000020
Train batch 7100
Avg. loss per last 100 batches: 0.660842
Epoch: 0: Step: 7101/9188, loss=0.124819, lr=0.000020
Train batch 7200
Avg. loss per last 100 batches: 0.487308
Epoch: 0: Step: 7201/9188, loss=1.259964, lr=0.000020
Train batch 7300
Avg. loss per last 100 batches: 0.732755
Epoch: 0: Step: 7301/9188, loss=0.739729, lr=0.000020
Train batch 7400
Avg. loss per last 100 batches: 0.629111
Epoch: 0: Step: 7401/9188, loss=1.745674, lr=0.000020
Train batch 7500
Avg. loss per last 100 batches: 0.488628
Epoch: 0: Step: 7501/9188, loss=0.006047, lr=0.000020
Train batch 7600
Avg. loss per last 100 batches: 0.533195
Epoch: 0: Step: 7601/9188, loss=0.000862, lr=0.000020
Train batch 7700
Avg. loss per last 100 batches: 0.575115
Epoch: 0: Step: 7701/9188, loss=2.283909, lr=0.000020
Train batch 7800
Avg. loss per last 100 batches: 0.538137
Epoch: 0: Step: 7801/9188, loss=0.102370, lr=0.000020
Train batch 7900
Avg. loss per last 100 batches: 0.648084
Epoch: 0: Step: 7901/9188, loss=0.504715, lr=0.000020
Train batch 8000
Avg. loss per last 100 batches: 0.672259
Epoch: 0: Step: 8001/9188, loss=0.274606, lr=0.000020
Train batch 8100
Avg. loss per last 100 batches: 0.593986
Epoch: 0: Step: 8101/9188, loss=0.006901, lr=0.000020
Train batch 8200
Avg. loss per last 100 batches: 0.586650
Epoch: 0: Step: 8201/9188, loss=3.564882, lr=0.000020
Train batch 8300
Avg. loss per last 100 batches: 0.718681
Epoch: 0: Step: 8301/9188, loss=0.956982, lr=0.000020
Train batch 8400
Avg. loss per last 100 batches: 0.715578
Epoch: 0: Step: 8401/9188, loss=1.803287, lr=0.000020
Train batch 8500
Avg. loss per last 100 batches: 0.649944
Epoch: 0: Step: 8501/9188, loss=0.170287, lr=0.000020
Train batch 8600
Avg. loss per last 100 batches: 0.497341
Epoch: 0: Step: 8601/9188, loss=0.182119, lr=0.000020
Train batch 8700
Avg. loss per last 100 batches: 0.561773
Epoch: 0: Step: 8701/9188, loss=0.013547, lr=0.000020
Train batch 8800
Avg. loss per last 100 batches: 0.665971
Epoch: 0: Step: 8801/9188, loss=0.314558, lr=0.000020
Train batch 8900
Avg. loss per last 100 batches: 0.533789
Epoch: 0: Step: 8901/9188, loss=0.389280, lr=0.000020
Train batch 9000
Avg. loss per last 100 batches: 0.620023
Epoch: 0: Step: 9001/9188, loss=0.113274, lr=0.000020
Train batch 9100
Avg. loss per last 100 batches: 0.567672
Epoch: 0: Step: 9101/9188, loss=0.535228, lr=0.000020
Validation: Epoch: 0 Step: 9188/9188
NLL validation ...
Total cleaned data size: 0
0.0
Traceback (most recent call last):
File "run_xict.py", line 602, in
It looks like data
is an empty list because the positive_ctxs
field is empty. Most likely there were some mistakes when you ran create_ict_samples.py
. Can you check if this line was run properly (i.e. the positive_ctxs
field should be assigned a non-empty list) for the dev data?
In run_xict.sh there is a command python -m torch.distributed.launch \ --nproc_per_node 1 run_xict.py \ --max_grad_norm 2.0 \ --encoder_model_type hf_bert \ --pretrained_model_cfg bert-base-multilingual-uncased \ --seed 12345 --sequence_length 256 \ --warmup_steps 300 --batch_size 4 --do_lower_case \ --train_file "../../data/bbc_passages/all_ictsamples.jsonl[0,1,2]" \ --dev_file ../../data/bbc_passages/all_ict_samples.jsonl_dev \ --output_dir xict_outputs \ --checkpoint_file_name xICT_biencoder.pt \ --learning_rate 2e-05 --num_train_epochs 40 \ --dev_batch_size 6 --val_av_rank_start_epoch 30 but I don't know where can i find all_ict_samples.jsonl_dev
Instead of this file I am using all_ict_samples-trans100.jsonl but it gives me error https://github.com/khuangaf/CONCRETE/issues/4#issue-1650402840