songyouwei / ABSA-PyTorch

Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析,使用PyTorch实现。
MIT License
1.99k stars 522 forks source link

请教原因:训练外国语模型时出错(ValueError: index can't contain negative values) #180

Closed alian921 closed 3 years ago

alian921 commented 3 years ago

您好 用您默认的训练集训练完全没有问题。 目前正在试验对英语以外的语言的适用性。正在试验日文。

①我准备了日文推特训练集,并且修改dependency_graph.py里的spacy语言模型,让其对应日文的依赖树以便生成正确的1,0 Array。

nlp = spacy.load('en_core_web_sm')

nlp = spacy.load('ja_core_news_sm') 顺利生成了graph文件。如下。 $ du -sh datasets/my-twitter/* 16K datasets/my-twitter/test.raw 340K datasets/my-twitter/test.raw.graph 32K datasets/my-twitter/train.raw 1.1M datasets/my-twitter/train.raw.graph

②然后修改train.py,将pretrained_bert模型也置换成日语bert模型。如下。 parser.add_argument('--pretrained_bert_name', default='cl-tohoku/bert-base-japanese-whole-word-masking', type=str)

③最后训练的时候出现了下面的错误,可以指点一下原因吗? $ python train.py --model_name bert_spc --dataset my-twitter Traceback (most recent call last): File "train.py", line 307, in main() File "train.py", line 302, in main ins = Instructor(opt) File "train.py", line 52, in init self.trainset = ABSADataset(opt.dataset_file['train'], tokenizer) File "/usr/local/src/ABSA-PyTorch/data_utils.py", line 161, in init dependency_graph = np.pad(idx2graph[i], \ File "<__array_function__ internals>", line 5, in pad File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 743, in pad pad_width = _as_pairs(pad_width, array.ndim, as_index=True) File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 514, in _as_pairs raise ValueError("index can't contain negative values") ValueError: index can't contain negative values

alian921 commented 3 years ago

我把max_seq_len的值从默认的85延长到200后,问题解决了。应该是自己准备的文本太长的缘故。