can not find all_ict_samples.jsonl_dev

rahulmool commented 1 year ago

In run_xict.sh there is a command python -m torch.distributed.launch \ --nproc_per_node 1 run_xict.py \ --max_grad_norm 2.0 \ --encoder_model_type hf_bert \ --pretrained_model_cfg bert-base-multilingual-uncased \ --seed 12345 --sequence_length 256 \ --warmup_steps 300 --batch_size 4 --do_lower_case \ --train_file "../../data/bbc_passages/all_ictsamples.jsonl[0,1,2]" \ --dev_file ../../data/bbc_passages/all_ict_samples.jsonl_dev \ --output_dir xict_outputs \ --checkpoint_file_name xICT_biencoder.pt \ --learning_rate 2e-05 --num_train_epochs 40 \ --dev_batch_size 6 --val_av_rank_start_epoch 30 but I don't know where can i find all_ict_samples.jsonl_dev
Instead of this file I am using all_ict_samples-trans100.jsonl but it gives me error https://github.com/khuangaf/CONCRETE/issues/4#issue-1650402840

khuangaf commented 1 year ago

Can you check whether all_ict_samples-trans100.jsonl contains any data, or is it empty?

rahulmool commented 1 year ago

Yes It does contains data

khuangaf commented 1 year ago

Can you check if this data variable is not an empty list? https://github.com/khuangaf/CONCRETE/blob/master/CORA/mDPR/run_xict.py#L94

rahulmool commented 1 year ago

yes for validation the data variable is empty.

rahulmool commented 1 year ago

this is the exact output

init using bert-base-multilingual-uncased loading weights file https://cdn.huggingface.co/bert-base-multilingual-uncased-pytorch_model.bin from cache at /home/22cs60r72/.cache/torch/transformers/b72dd13aa8437628227c4928499f2476a84af4ab7ed75eb1365c5ae9fdbd7638.54b4dad9cc3db9aa8448458b782d11ab07c80dedb951906fd2f684a00ecdb1ee All model checkpoint weights were used when initializing HFBertEncoder.

All the weights of HFBertEncoder were initialized from the model checkpoint at bert-base-multilingual-uncased. If your task is similar to the task the model of the ckeckpoint was trained on, you can already use HFBertEncoder for predictions without further training. loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-config.json from cache at /home/22cs60r72/.cache/torch/transformers/33b56ce0f312e47e4d77a57791a4fc6233ae4a560dd2bdd186107058294e58ab.fcb1786f49c279f0e0f158c9972b9bd9f6c0edb5d893dcb9b530d714d86f0edc Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 105879 }

init using bert-base-multilingual-uncased loading weights file https://cdn.huggingface.co/bert-base-multilingual-uncased-pytorch_model.bin from cache at /home/22cs60r72/.cache/torch/transformers/b72dd13aa8437628227c4928499f2476a84af4ab7ed75eb1365c5ae9fdbd7638.54b4dad9cc3db9aa8448458b782d11ab07c80dedb951906fd2f684a00ecdb1ee All model checkpoint weights were used when initializing HFBertEncoder.

All the weights of HFBertEncoder were initialized from the model checkpoint at bert-base-multilingual-uncased. If your task is similar to the task the model of the ckeckpoint was trained on, you can already use HFBertEncoder for predictions without further training. loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt from cache at /home/22cs60r72/.cache/torch/transformers/bb773818882b0524dc53a1b31a2cc95bc489f000e7e19773ba07846011a6c711.535306b226c42cebebbc0dabc83b92ab11260e9919e21e2ab0beb301f267b4c7 Reading file ../../data/bbc_passages/all_ict_samples.jsonl_1 Aggregated data size: 12250 Reading file ../../data/bbc_passages/all_ict_samples.jsonl_2 Aggregated data size: 24500 Reading file ../../data/bbc_passages/all_ict_samples.jsonl_0 Aggregated data size: 36750 Total cleaned data size: 36750 Total iterations per epoch=9188 Total updates=367520 Eval step = 9188 Training Epoch 0 Epoch: 0: Step: 1/9188, loss=7.311617, lr=0.000000 Train batch 100 Avg. loss per last 100 batches: 5.303617 Epoch: 0: Step: 101/9188, loss=0.816039, lr=0.000007 Train batch 200 Avg. loss per last 100 batches: 1.082258 Epoch: 0: Step: 201/9188, loss=1.185385, lr=0.000013 Train batch 300 Avg. loss per last 100 batches: 1.143644 Epoch: 0: Step: 301/9188, loss=2.163596, lr=0.000020 Train batch 400 Avg. loss per last 100 batches: 1.020407 Epoch: 0: Step: 401/9188, loss=0.395899, lr=0.000020 Train batch 500 Avg. loss per last 100 batches: 0.950857 Epoch: 0: Step: 501/9188, loss=1.575688, lr=0.000020 Train batch 600 Avg. loss per last 100 batches: 0.833716 Epoch: 0: Step: 601/9188, loss=0.066809, lr=0.000020 Train batch 700 Avg. loss per last 100 batches: 0.844092 Epoch: 0: Step: 701/9188, loss=2.120656, lr=0.000020 Train batch 800 Avg. loss per last 100 batches: 0.758489 Epoch: 0: Step: 801/9188, loss=0.041512, lr=0.000020 Train batch 900 Avg. loss per last 100 batches: 0.760439 Epoch: 0: Step: 901/9188, loss=0.701504, lr=0.000020 Train batch 1000 Avg. loss per last 100 batches: 0.725864 Epoch: 0: Step: 1001/9188, loss=0.466521, lr=0.000020 Train batch 1100 Avg. loss per last 100 batches: 0.820713 Epoch: 0: Step: 1101/9188, loss=1.519754, lr=0.000020 Train batch 1200 Avg. loss per last 100 batches: 0.722633 Epoch: 0: Step: 1201/9188, loss=0.267421, lr=0.000020 Train batch 1300 Avg. loss per last 100 batches: 0.803082 Epoch: 0: Step: 1301/9188, loss=0.025564, lr=0.000020 Train batch 1400 Avg. loss per last 100 batches: 0.688916 Epoch: 0: Step: 1401/9188, loss=0.268893, lr=0.000020 Train batch 1500 Avg. loss per last 100 batches: 0.694311 Epoch: 0: Step: 1501/9188, loss=0.423493, lr=0.000020 Train batch 1600 Avg. loss per last 100 batches: 0.733490 Epoch: 0: Step: 1601/9188, loss=0.326050, lr=0.000020 Train batch 1700 Avg. loss per last 100 batches: 0.777383 Epoch: 0: Step: 1701/9188, loss=1.815521, lr=0.000020 Train batch 1800 Avg. loss per last 100 batches: 0.654307 Epoch: 0: Step: 1801/9188, loss=0.704480, lr=0.000020 Train batch 1900 Avg. loss per last 100 batches: 0.791680 Epoch: 0: Step: 1901/9188, loss=0.410724, lr=0.000020 Train batch 2000 Avg. loss per last 100 batches: 0.658655 Epoch: 0: Step: 2001/9188, loss=0.005747, lr=0.000020 Train batch 2100 Avg. loss per last 100 batches: 0.762728 Epoch: 0: Step: 2101/9188, loss=0.768077, lr=0.000020 Train batch 2200 Avg. loss per last 100 batches: 0.724533 Epoch: 0: Step: 2201/9188, loss=0.725896, lr=0.000020 Train batch 2300 Avg. loss per last 100 batches: 0.682972 Epoch: 0: Step: 2301/9188, loss=1.073155, lr=0.000020 Train batch 2400 Avg. loss per last 100 batches: 0.648425 Epoch: 0: Step: 2401/9188, loss=0.473070, lr=0.000020 Train batch 2500 Avg. loss per last 100 batches: 0.625523 Epoch: 0: Step: 2501/9188, loss=0.043014, lr=0.000020 Train batch 2600 Avg. loss per last 100 batches: 0.701965 Epoch: 0: Step: 2601/9188, loss=0.006406, lr=0.000020 Train batch 2700 Avg. loss per last 100 batches: 0.710023 Epoch: 0: Step: 2701/9188, loss=1.481423, lr=0.000020 Train batch 2800 Avg. loss per last 100 batches: 0.562529 Epoch: 0: Step: 2801/9188, loss=0.711672, lr=0.000020 Train batch 2900 Avg. loss per last 100 batches: 0.823689 Epoch: 0: Step: 2901/9188, loss=1.403012, lr=0.000020 Train batch 3000 Avg. loss per last 100 batches: 0.713877 Epoch: 0: Step: 3001/9188, loss=1.028094, lr=0.000020 Train batch 3100 Avg. loss per last 100 batches: 0.655354 Epoch: 0: Step: 3101/9188, loss=0.650727, lr=0.000020 Train batch 3200 Avg. loss per last 100 batches: 0.707570 Epoch: 0: Step: 3201/9188, loss=0.115641, lr=0.000020 Train batch 3300 Avg. loss per last 100 batches: 0.521763 Epoch: 0: Step: 3301/9188, loss=0.057539, lr=0.000020 Train batch 3400 Avg. loss per last 100 batches: 0.611837 Epoch: 0: Step: 3401/9188, loss=0.220680, lr=0.000020 Train batch 3500 Avg. loss per last 100 batches: 0.687215 Epoch: 0: Step: 3501/9188, loss=0.117760, lr=0.000020 Train batch 3600 Avg. loss per last 100 batches: 0.612891 Epoch: 0: Step: 3601/9188, loss=1.465150, lr=0.000020 Train batch 3700 Avg. loss per last 100 batches: 0.850417 Epoch: 0: Step: 3701/9188, loss=0.035678, lr=0.000020 Train batch 3800 Avg. loss per last 100 batches: 0.789871 Epoch: 0: Step: 3801/9188, loss=0.646053, lr=0.000020 Train batch 3900 Avg. loss per last 100 batches: 0.752498 Epoch: 0: Step: 3901/9188, loss=0.282335, lr=0.000020 Train batch 4000 Avg. loss per last 100 batches: 0.567328 Epoch: 0: Step: 4001/9188, loss=0.012028, lr=0.000020 Train batch 4100 Avg. loss per last 100 batches: 0.548741 Epoch: 0: Step: 4101/9188, loss=1.539706, lr=0.000020 Train batch 4200 Avg. loss per last 100 batches: 0.734413 Epoch: 0: Step: 4201/9188, loss=0.000198, lr=0.000020 Train batch 4300 Avg. loss per last 100 batches: 0.548030 Epoch: 0: Step: 4301/9188, loss=2.856502, lr=0.000020 Train batch 4400 Avg. loss per last 100 batches: 0.707106 Epoch: 0: Step: 4401/9188, loss=0.536959, lr=0.000020 Train batch 4500 Avg. loss per last 100 batches: 0.601878 Epoch: 0: Step: 4501/9188, loss=0.015160, lr=0.000020 Train batch 4600 Avg. loss per last 100 batches: 0.766000 Epoch: 0: Step: 4601/9188, loss=0.040841, lr=0.000020 Train batch 4700 Avg. loss per last 100 batches: 0.767216 Epoch: 0: Step: 4701/9188, loss=1.521197, lr=0.000020 Train batch 4800 Avg. loss per last 100 batches: 0.615036 Epoch: 0: Step: 4801/9188, loss=1.145796, lr=0.000020 Train batch 4900 Avg. loss per last 100 batches: 0.671538 Epoch: 0: Step: 4901/9188, loss=2.099149, lr=0.000020 Train batch 5000 Avg. loss per last 100 batches: 0.632023 Epoch: 0: Step: 5001/9188, loss=0.254401, lr=0.000020 Train batch 5100 Avg. loss per last 100 batches: 0.654933 Epoch: 0: Step: 5101/9188, loss=0.479718, lr=0.000020 Train batch 5200 Avg. loss per last 100 batches: 0.542308 Epoch: 0: Step: 5201/9188, loss=0.670710, lr=0.000020 Train batch 5300 Avg. loss per last 100 batches: 0.565748 Epoch: 0: Step: 5301/9188, loss=0.003618, lr=0.000020 Train batch 5400 Avg. loss per last 100 batches: 0.620327 Epoch: 0: Step: 5401/9188, loss=1.348403, lr=0.000020 Train batch 5500 Avg. loss per last 100 batches: 0.600770 Epoch: 0: Step: 5501/9188, loss=0.152233, lr=0.000020 Train batch 5600 Avg. loss per last 100 batches: 0.494991 Epoch: 0: Step: 5601/9188, loss=0.047969, lr=0.000020 Train batch 5700 Avg. loss per last 100 batches: 0.647839 Epoch: 0: Step: 5701/9188, loss=0.137598, lr=0.000020 Train batch 5800 Avg. loss per last 100 batches: 0.566338 Epoch: 0: Step: 5801/9188, loss=0.314973, lr=0.000020 Train batch 5900 Avg. loss per last 100 batches: 0.639788 Epoch: 0: Step: 5901/9188, loss=0.009664, lr=0.000020 Train batch 6000 Avg. loss per last 100 batches: 0.509846 Epoch: 0: Step: 6001/9188, loss=0.239767, lr=0.000020 Train batch 6100 Avg. loss per last 100 batches: 0.629419 Epoch: 0: Step: 6101/9188, loss=0.000416, lr=0.000020 Train batch 6200 Avg. loss per last 100 batches: 0.539567 Epoch: 0: Step: 6201/9188, loss=0.014686, lr=0.000020 Train batch 6300 Avg. loss per last 100 batches: 0.730408 Epoch: 0: Step: 6301/9188, loss=0.434280, lr=0.000020 Train batch 6400 Avg. loss per last 100 batches: 0.466699 Epoch: 0: Step: 6401/9188, loss=0.750414, lr=0.000020 Train batch 6500 Avg. loss per last 100 batches: 0.699479 Epoch: 0: Step: 6501/9188, loss=1.802849, lr=0.000020 Train batch 6600 Avg. loss per last 100 batches: 0.700786 Epoch: 0: Step: 6601/9188, loss=0.286703, lr=0.000020 Train batch 6700 Avg. loss per last 100 batches: 0.851789 Epoch: 0: Step: 6701/9188, loss=0.336051, lr=0.000020 Train batch 6800 Avg. loss per last 100 batches: 0.531913 Epoch: 0: Step: 6801/9188, loss=0.021930, lr=0.000020 Train batch 6900 Avg. loss per last 100 batches: 0.556301 Epoch: 0: Step: 6901/9188, loss=0.018971, lr=0.000020 Train batch 7000 Avg. loss per last 100 batches: 0.567307 Epoch: 0: Step: 7001/9188, loss=2.772715, lr=0.000020 Train batch 7100 Avg. loss per last 100 batches: 0.660842 Epoch: 0: Step: 7101/9188, loss=0.124819, lr=0.000020 Train batch 7200 Avg. loss per last 100 batches: 0.487308 Epoch: 0: Step: 7201/9188, loss=1.259964, lr=0.000020 Train batch 7300 Avg. loss per last 100 batches: 0.732755 Epoch: 0: Step: 7301/9188, loss=0.739729, lr=0.000020 Train batch 7400 Avg. loss per last 100 batches: 0.629111 Epoch: 0: Step: 7401/9188, loss=1.745674, lr=0.000020 Train batch 7500 Avg. loss per last 100 batches: 0.488628 Epoch: 0: Step: 7501/9188, loss=0.006047, lr=0.000020 Train batch 7600 Avg. loss per last 100 batches: 0.533195 Epoch: 0: Step: 7601/9188, loss=0.000862, lr=0.000020 Train batch 7700 Avg. loss per last 100 batches: 0.575115 Epoch: 0: Step: 7701/9188, loss=2.283909, lr=0.000020 Train batch 7800 Avg. loss per last 100 batches: 0.538137 Epoch: 0: Step: 7801/9188, loss=0.102370, lr=0.000020 Train batch 7900 Avg. loss per last 100 batches: 0.648084 Epoch: 0: Step: 7901/9188, loss=0.504715, lr=0.000020 Train batch 8000 Avg. loss per last 100 batches: 0.672259 Epoch: 0: Step: 8001/9188, loss=0.274606, lr=0.000020 Train batch 8100 Avg. loss per last 100 batches: 0.593986 Epoch: 0: Step: 8101/9188, loss=0.006901, lr=0.000020 Train batch 8200 Avg. loss per last 100 batches: 0.586650 Epoch: 0: Step: 8201/9188, loss=3.564882, lr=0.000020 Train batch 8300 Avg. loss per last 100 batches: 0.718681 Epoch: 0: Step: 8301/9188, loss=0.956982, lr=0.000020 Train batch 8400 Avg. loss per last 100 batches: 0.715578 Epoch: 0: Step: 8401/9188, loss=1.803287, lr=0.000020 Train batch 8500 Avg. loss per last 100 batches: 0.649944 Epoch: 0: Step: 8501/9188, loss=0.170287, lr=0.000020 Train batch 8600 Avg. loss per last 100 batches: 0.497341 Epoch: 0: Step: 8601/9188, loss=0.182119, lr=0.000020 Train batch 8700 Avg. loss per last 100 batches: 0.561773 Epoch: 0: Step: 8701/9188, loss=0.013547, lr=0.000020 Train batch 8800 Avg. loss per last 100 batches: 0.665971 Epoch: 0: Step: 8801/9188, loss=0.314558, lr=0.000020 Train batch 8900 Avg. loss per last 100 batches: 0.533789 Epoch: 0: Step: 8901/9188, loss=0.389280, lr=0.000020 Train batch 9000 Avg. loss per last 100 batches: 0.620023 Epoch: 0: Step: 9001/9188, loss=0.113274, lr=0.000020 Train batch 9100 Avg. loss per last 100 batches: 0.567672 Epoch: 0: Step: 9101/9188, loss=0.535228, lr=0.000020 Validation: Epoch: 0 Step: 9188/9188 NLL validation ... Total cleaned data size: 0 0.0 Traceback (most recent call last): File "run_xict.py", line 602, in main() File "run_xict.py", line 592, in main trainer.run_train() File "run_xict.py", line 132, in run_train self._train_epoch(scheduler, epoch, eval_step, train_iterator) File "run_xict.py", line 365, in _train_epoch self.validate_and_save(epoch, train_data_iterator.get_iteration(), scheduler) File "run_xict.py", line 148, in validate_and_save validation_loss = self.validate_nll() File "run_xict.py", line 189, in validate_nll correct_ratio = float(total_correct_predictions / total_samples) ZeroDivisionError: division by zero

khuangaf commented 1 year ago

It looks like data is an empty list because the positive_ctxs field is empty. Most likely there were some mistakes when you ran create_ict_samples.py. Can you check if this line was run properly (i.e. the positive_ctxs field should be assigned a non-empty list) for the dev data?

khuangaf / CONCRETE

can not find all_ict_samples.jsonl_dev #5