yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
836 stars 142 forks source link

RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered #10

Closed xiaoxiaoAurora closed 5 years ago

xiaoxiaoAurora commented 5 years ago

你好,在python trun.py train --device 1 是会出现下列的错误: torch.cuda.is_available(): True

Traceback (most recent call last): File "run.py", line 41, in args.func(args) File "/home/workspace/biaffine-parser/parser/cmds/train.py", line 92, in call file=args.file) File "/home/workspace/biaffine-parser/parser/model.py", line 34, in call self.train(train_loader) File "/home/workspace/biaffine-parser/parser/model.py", line 75, in train loss.backward() File "/home/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

yzhangcs commented 5 years ago

介意看一下模型吗

xiaoxiaoAurora commented 5 years ago

介意看一下模型吗

我找到错误地方了,把n_tag_embed设置为0了,在parser.py中没有做出相应的处理。 但是又遇到了一个新问题,成功运行完1000epoch后在model的load过程报错了:

6005 Epoch 1000 / 1000: 6006 train: Loss: 0.0005 UAS: 99.99% LAS: 99.99% 6007 dev: Loss: 0.9419 UAS: 80.68% LAS: 80.68% 6008 test: Loss: 0.9398 UAS: 82.08% LAS: 82.08% 6009 0:00:10.018016s elapsed 6010 6011 Traceback (most recent call last): 6012 File "run.py", line 41, in 6013 args.func(args) 6014 File "/home/lxiao/workspace/biaffine-parser/parser/cmds/train.py", line 93, in call 6015 file=args.file) 6016 File "/home/lxiao/workspace/biaffine-parser/parser/model.py", line 53, in call 6017 self.network = BiaffineParser.load(file) 6018 File "/home/lxiao/workspace/biaffine-parser/parser/parser.py", line 106, in load 6019 state = torch.load(fname, map_location=device) 6020 File "/home/lxiao/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 367, in load 6021 return _load(f, map_location, pickle_module) 6022 File "/home/lxiao/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 545, in _load 6023 deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) 6024 RuntimeError: storage has wrong size: expected 0 got 3

yzhangcs commented 5 years ago

我不太清楚,模型没有正确保存?