关于数据处理问题

zhoucz97 commented 3 years ago

您好，我发现您的data_utils.py中，build_tokenizer()函数中有text_left, _, text_right = [s.lower().strip() for s in lines[i].partition("$T$")]这行代码，

可是如果句子中有两个或以上的'$T$'，比如ACL-14数据集test.raw的第一行就是：

$T$ to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... .

那么得到的 text_left = ''; text_right = 'to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... . '

text_right中还有一个$T$并未筛出来，请问是故意这样做的吗？还是说这是为了方便处理的妥协之举？

希望作者能够答疑解惑~~

songyouwei commented 3 years ago

这是个bug 。。

zhoucz97 commented 3 years ago

hhhh，那好吧，谢谢回复~~

songyouwei / ABSA-PyTorch