Open liu-nlper opened 6 years ago
I had the same problem
I am also struggling on this one. Any help?
I heard from one person working with Chinese that adding a space between every character might do the trick.
For Japanese which is also written without spaces, tokenization by MeCab (Japanese morphological analyzer) solves the problem. See http://brat.nlplab.org/configuration.html ("Option configuration ([options] section)" )
A Chinese dictionary for MeCab (i.e. which makes MeCab a Chinese morphological analyzer) is available ( license agreement necessary ) from Matsumoto Lab, NAIST, so using the dictionary with MeCab may solve your problem, but, again, I have never tried to use brat for Chinese so I cannot guarantee that it works.
https://cl.naist.jp/index.php?%B8%F8%B3%AB%A5%EA%A5%BD%A1%BC%A5%B9%2FNCD (in Japanese)
How to solve it?
I have the same problem. Actually, a simple and direct way is to enable brat to wrap text based on any character, once the length exceeds the visualization width. I don't know whether it is feasible, and if so, how to modify the code to realize it. Please help. Thx.
I have the same problem. Actually, a simple and direct way is to enable brat to wrap text based on any character, once the length exceeds the visualization width. I don't know whether it is feasible, and if so, how to modify the code to realize it. Please help. Thx.
According to your idea, I modify the code at https://github.com/nlplab/brat/blob/master/server/src/tokenise.py#L46.
Change code tokens = text.split()
to tokens = list(text)
, and the problem is solved.
我也有同样的问题。其实,一个简单直接的方法就是让brat根据任何字符来换行,一旦长度超过可视化宽度。不知道是否可行,如果可行,如何修改代码实现。请帮忙。谢谢。
根据你的想法,我修改了https://github.com/nlplab/brat/blob/master/server/src/tokenise.py#L46 上的代码。 将代码
tokens = text.split()
改成tokens = list(text)
,问题解决。
This will result in no space segmentation in the English part of the mixed Chinese and English text, How to solve this problem?
How to solve this problem?