princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

run supervise simCSE error: 'input_ids' #220

Closed Rachel-Yeah-Lee closed 1 year ago

Rachel-Yeah-Lee commented 1 year ago

您好,謝謝您提供的原始碼,想請問當我使用監督式訓練 SimCSE 的時候,我拿掉了資料中 hard_negative 的欄位後 (只有 sent0 與 sent1 pair),出現以下錯誤,請問是我在哪裡設置錯誤嗎? 懇請您提供建議,謝謝您!

Traceback (most recent call last): File "C:\Users\user\Desktop\KeKeLab\SimCSE_with_paraphrasing_DG\SimCSE\train.py", line 623, in main() File "C:\Users\user\Desktop\KeKeLab\SimCSE_with_paraphrasing_DG\SimCSE\train.py", line 587, in main train_result = trainer.train(model_path=model_path) File "C:\Users\user\Desktop\KeKeLab\SimCSE_with_paraphrasing_DG\SimCSE\simcse\trainers.py", line 450, in train for step, inputs in enumerate(epoch_iterator): File "C:\Users\user\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next data = self._next_data() File "C:\Users\user\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\user\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch return self.collate_fn(data) File "C:\Users\user\Desktop\KeKeLab\SimCSE_with_paraphrasing_DG\SimCSE\train.py", line 506, in call num_sent = len(features[0]['input_ids']) KeyError: 'input_ids'