smallporridge / Socialformer

The implement of Socialformer
MIT License
7 stars 2 forks source link

模型训练的问题 #4

Open tangjiawei777 opened 2 years ago

tangjiawei777 commented 2 years ago

你好,我想问一下这篇论文似乎没有区分什么训练集,测试集和验证集?好像是把数据tokenize之后全部给放进去了,然后跑一个epoch得到结果,是这样吗?

smallporridge commented 2 years ago

请参考论文中4.1所述:It consists of 367 thousand training queries, and 5 thousand development queries for evaluation. 对于MS MARCO数据集,我们使用它的train数据作为训练集,dev数据作为测试集。

tangjiawei777 commented 2 years ago

当我将数据集处理好之后,cd/dataprocess,执行bash ./run.sh报错了 Traceback (most recent call last):
File "gen_dynamic_centrality_weight.py", line 117, in w_attention_list = get_attention_weight() File "gen_dynamic_centrality_weight.py", line 38, in get_attentionweight for , dic in enumerate(epoch_iterator): File "/root/miniconda3/envs/so/lib/python3.6/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. (这个错误我上网找了很多,有的说是pytorch和torchvision版本不匹配,我是按照要求安装的pytorch=1.9啊,就不懂它为什么还会报错)

Original Traceback (most recent call last): File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/envs/so/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/alien/Socialformer-main/dataprocess/bert/dataset.py", line 94, in getitem encoding = self._tokenizer.encoder_plus( AttributeError: 'BertTokenizer' object has no attribute 'encoder_plus' (这里的报错有些没看懂,我已经指明了bert-base-uncased路径了啊,按理来说不应该出现这个的)

yhy-2000 commented 2 years ago

这两个问题看起来像是版本问题,dataprocess部分具体的python库版本如下,你可以更新一下看看能否解决这个问题 python=3.7 datasets==1.11.0 matplotlib==2.2.3 numpy==1.21.2 pandas==0.24.2 pyarrow==5.0.0 pytrec_eval==0.5 torch==1.9.0 tqdm==4.63.0 transformers==4.8.1

tangjiawei777 commented 2 years ago

说实话,第一个报错是真的离谱,说是pytorch和torchvision版本不匹配,我专门取得pytorch官网下载的pytorch1.9.0版本:

CUDA 10.2

conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=10.2 -c pytorch 这还能是版本不匹配(捂脸哭)

第二个报错是在dataprocess/dataset.py,我没有对这个文件做任何的修改,我已经指明了bert-base-uncased路径了啊,按理来说不应该出现这个的

刚才按照你说的这个更新了一下python库的版本,任然存在上面的的两个错误,导致执行gen_dynamic_distance_weight.py文件的时候无法生成bert.txt文件,方便告知一下你的邮箱吗?我可以把我的项目和requirements.txt发给你看看

yhy-2000 commented 2 years ago

麻烦确认你的数据集格式是否与demo.json完全一致,包括单双引号/结尾换行符等

tangjiawei777 commented 2 years ago

我的数据集是按照demo.json格式处理的,没有差别。但是即使我用demo.json任然会出现同样的报错,我也没有修改过dataset.py文件,所以我不太明白为什么为这样。 Traceback (most recent call last):
File "gen_dynamic_centrality_weight.py", line 117, in w_attention_list = get_attention_weight() File "gen_dynamic_centrality_weight.py", line 38, in get_attentionweight for , dic in enumerate(epoch_iterator): File "/root/miniconda3/envs/so/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/envs/so/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/alien/Socialformer-main/dataprocess/bert/dataset.py", line 94, in getitem encoding = self._tokenizer.encoder_plus( AttributeError: 'BertTokenizer' object has no attribute 'encoder_plus'

./run.sh: line 1: 265626 Segmentation fault (core dumped) python gen_dynamic_centrality_weight.py