yangjianxin1 / GPT2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)
2.99k stars 680 forks source link

想使用GPT2的微调来实现负样本的生成 #124

Open jazzlee008 opened 1 year ago

jazzlee008 commented 1 year ago

因为负样本的数量非常的少(只有150条左右,每条对话的长度不会超过100个字,中英文混杂)

preprocessing以后,使用train.py会报错如下,请求帮助。问题是出在哪里?

Traceback (most recent call last): File "train.py", line 427, in main() File "train.py", line 423, in main train(model, logger, train_dataset, validate_dataset, args) File "train.py", line 268, in train train_dataloader = DataLoader( File "/home/lee/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 351, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/home/lee/.local/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 107, in init raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0