使用命令运行training_sup_text_matching_model_mydata.py脚本报错You have to specify either input_ids or inputs_embeds

1006076811 commented 1 year ago

使用命令运行training_sup_text_matching_model_mydata.py脚本报错 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 \ training_sup_text_matching_model_jsonl_data.py \ --do_train \ --do_predict \ --output_dir /media/ducheng/data_one/nlp_model/text2vec-base-chinese-paraphrase-fintune \ --batch_size 32 \ --bf16 \ --data_parallel \ --save_model_every_epoch

Traceback (most recent call last): 2.0366: 99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 649/655 [10:19<00:05, 1.05it/s] File "/home/ducheng/PycharmProjects/text2vec/examples/training_sup_text_matching_model_mydata.py", line 131, in main() File "/home/ducheng/PycharmProjects/text2vec/examples/training_sup_text_matching_model_mydata.py", line 82, in main model.train_model( File "/home/ducheng/PycharmProjects/text2vec/examples/../text2vec/cosent_model.py", line 111, in train_model global_step, training_details = self.train( File "/home/ducheng/PycharmProjects/text2vec/examples/../text2vec/cosent_model.py", line 266, in train output_embeddings = self.get_sentence_embeddings(input_ids, attention_mask, token_type_ids) File "/home/ducheng/PycharmProjects/text2vec/examples/../text2vec/sentence_model.py", line 102, in get_sentence_embeddings model_output = self.bert(input_ids, attention_mask, token_type_ids, output_hidden_states=True) File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply output.reraise() File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception ValueError: Caught ValueError in replica 2 on device 2. Original Traceback (most recent call last): File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, *kwargs) File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/media/ducheng/data_one/Anaconda3/envs/text2vec/lib/python3.9/site-packages/transformers/models/ernie/modeling_ernie.py", line 903, in forward raise ValueError("You have to specify either input_ids or inputs_embeds") ValueError: You have to specify either input_ids or inputs_embeds

1006076811 commented 1 year ago

单卡可正常运行，多卡报这个错误，我按照example里的数据格式整理的jsonl数据集

shibing624 commented 1 year ago

我本地测试多卡是跑通的。

你可以更新到最新代码测试；
你下载或者clone了代码到本地，可以不用pip安装text2vec了。可以pip uninstall text2vec，再试。

1006076811 commented 1 year ago

很奇怪，今天再重新跑就没有这个错误了，多卡可以正常跑通了 :)

1006076811 commented 1 year ago

你好，首先感谢您之前为我解答问题。我又复现了这个问题，我使用batch_size 32，训练集数量为67时会报这个错误，我发现可能是和最后一个batch的数据数量有关系，每次都是最后一个batch才会出现该问题(包括eval的时候)，具体的原因我没有找到，训练集数量为68时就可以正常跑通了。

1006076811 commented 1 year ago

单卡似乎没有这个问题

xxyp commented 1 year ago

多卡遇到了同样的问题+1

shibing624 commented 1 year ago

你遇到啥问题？base model是啥，错误日志是啥？

xxyp commented 1 year ago

base model是text2vec-base-chinese-sentence，问题跟上面的一样，也是报You have to specify either input_ids or inputs_embeds

shibing624 commented 1 year ago

base model是text2vec-base-chinese-sentence，问题跟上面的一样，也是报You have to specify either input_ids or inputs_embeds

确认：1.代码是最新的吗？2. 库版本是最新的。

HaoRenkk123 commented 12 months ago

+1同样遇到该问题，模型用的是text2vec-bge-large-chinese，数据用的是text2vec-base-multilingual-dataset/all.jsonl，代码是最新的

HaoRenkk123 commented 12 months ago

requirements.txt中的transformers>=4.6.0,我用的最新的4.33.2

HaoRenkk123 commented 12 months ago

你好，首先感谢您之前为我解答问题。我又复现了这个问题，我使用batch_size 32，训练集数量为67时会报这个错误，我发现可能是和最后一个batch的数据数量有关系，每次都是最后一个batch才会出现该问题(包括eval的时候)，具体的原因我没有找到，训练集数量为68时就可以正常跑通了。

请问这个68，67是67万还是68万呢，我也遇到这个问题，不清楚是不是需要训练集数量/gpus/batch_size为整数才可以，你的gpus是多少呢

1006076811 commented 12 months ago

你好，首先感谢您之前为我解答问题。我又复现了这个问题，我使用batch_size 32，训练集数量为67时会报这个错误，我发现可能是和最后一个batch的数据数量有关系，每次都是最后一个batch才会出现该问题(包括eval的时候)，具体的原因我没有找到，训练集数量为68时就可以正常跑通了。

请问这个68，67是67万还是68万呢，我也遇到这个问题，不清楚是不是需要训练集数量/gpus/batch_size为整数才可以，你的gpus是多少呢

67条，68条，gpu4个

shibing624 / text2vec

使用命令运行training_sup_text_matching_model_mydata.py脚本报错You have to specify either input_ids or inputs_embeds #124