Closed jkkl closed 2 years ago
@jkkl OK, 你有修复的想法吗
下述代码有一种潜在情况,没有处理,如果两个aspect是连在一起的,且情感一样,上述代码只会添加一条一样,而不会拆成两个。例如: 【屏幕尺寸摄像头都不错】
先这样吧, 目前看够用了。。。
@jkkl OK, 你有修复的想法吗
print_head = 10
for s, t, p in data:
if len(s) > 0:
# prepare the atepc dataset, refer to https://github.com/yangheng95/PyABSA/issues/78
polarity_padding = [str(SENTIMENT_PADDING)] * len(t)
start_sentiment_index = -1
end_sentiment_index = -1
for p_idx in range(len(p)):
if p[p_idx] != str(SENTIMENT_PADDING) and (p_idx == 0 or p[p_idx] != p[p_idx-1]):
start_sentiment_index = p_idx
elif p[p_idx] == str(SENTIMENT_PADDING) and p_idx != 0 and p[p_idx-1] != str(SENTIMENT_PADDING):
end_sentiment_index = p_idx
one_sentiment_labels = polarity_padding[:start_sentiment_index] + p[start_sentiment_index: end_sentiment_index] + polarity_padding[end_sentiment_index:]
prepared_data.append((s, t, one_sentiment_labels))
if print_head > 0:
print('\n'.join([' '.join(s), ' '.join(t), ' '.join(one_sentiment_labels)]))
print_head -= 1
if start_sentiment_index > end_sentiment_index:
# 处理尾部情况
one_sentiment_labels = polarity_padding[:start_sentiment_index] + p[start_sentiment_index:]
prepared_data.append((s, t, one_sentiment_labels))
if print_head > 0:
print('\n'.join([' '.join(s), ' '.join(t), ' '.join(one_sentiment_labels)]))
print_head -= 1
print('Prepared data len from file :{} nums'.format(len(prepared_data)))
return prepared_data
@yangheng95 类似的这一块也bug:
修改方案参考如下:
POLARITY_PADDING = [SENTIMENT_PADDING] * len(polarity)
start_sentiment_index = -1
end_sentiment_index = -1
example_id = i_batch * self.opt.infer_batch_size + i
for idx in range(len(polarity)):
if polarity[idx] != SENTIMENT_PADDING and (idx == 0 or polarity[idx] != polarity[idx - 1]):
start_sentiment_index = idx
elif polarity[idx] == SENTIMENT_PADDING and idx != 0 and polarity[idx -1] != SENTIMENT_PADDING:
end_sentiment_index = idx
one_sentiment_labels = POLARITY_PADDING[:start_sentiment_index] + polarity[start_sentiment_index: end_sentiment_index] + POLARITY_PADDING[end_sentiment_index:]
extraction_res.append((all_tokens[i + (self.opt.infer_batch_size * i_batch)], pred_iobs, one_sentiment_labels, example_id))
if start_sentiment_index > end_sentiment_index:
# 处理尾部情况
one_sentiment_labels = POLARITY_PADDING[:start_sentiment_index] + polarity[start_sentiment_index:]
extraction_res.append((all_tokens[i + (self.opt.infer_batch_size * i_batch)], pred_iobs, one_sentiment_labels, example_id))
return extraction_res, sentence_res
如果可以的话,关于这两个bug请分别提供一两个bad case以供调试
如果可以的话,关于这两个bug请分别提供一两个bad case以供调试
我自己的case是 我 喜 欢 小 狗
就构造一个aspect在最尾部的case就行。 例如:【 秉 O -999 承 O -999 了 O -999 时 O -999 尚 O -999 高 O -999 贵 O -999 的 O -999 外 B-ASP Positive 形 I-ASP Positive 设 I-ASP Positive 计 I-ASP Positive 】 在构造样本的时候,这条case会被过滤掉。 在预测时,这个case在情感识别阶段,只会把 【计】送进去,虽然aspect能识别对。
这段想把单条多个情感极性的样本,转为多条的代码有bug![image](https://user-images.githubusercontent.com/3937341/159827972-74441f0f-2827-4730-b461-de2a19688a47.png)
提供一个badcase: [-999, -999, -999, Positive] 这种样本,会被过滤掉~