nlplab / brat

brat rapid annotation tool (brat) - for all your textual annotation needs
http://brat.nlplab.org
Other
1.82k stars 509 forks source link

a very long sentence without space(for example Chinese text) can not word wrap #1277

Open liu-nlper opened 6 years ago

liu-nlper commented 6 years ago

How to solve this problem?

yxWisdom commented 5 years ago

I had the same problem

zhouyanggodking commented 5 years ago

I am also struggling on this one. Any help?

reckart commented 5 years ago

I heard from one person working with Chinese that adding a space between every character might do the trick.

yucca-t-k commented 5 years ago

For Japanese which is also written without spaces, tokenization by MeCab (Japanese morphological analyzer) solves the problem. See http://brat.nlplab.org/configuration.html ("Option configuration ([options] section)" )

A Chinese dictionary for MeCab (i.e. which makes MeCab a Chinese morphological analyzer) is available ( license agreement necessary ) from Matsumoto Lab, NAIST, so using the dictionary with MeCab may solve your problem, but, again, I have never tried to use brat for Chinese so I cannot guarantee that it works.

https://cl.naist.jp/index.php?%B8%F8%B3%AB%A5%EA%A5%BD%A1%BC%A5%B9%2FNCD (in Japanese)

879733672 commented 4 years ago

How to solve it?

btyu commented 3 years ago

I have the same problem. Actually, a simple and direct way is to enable brat to wrap text based on any character, once the length exceeds the visualization width. I don't know whether it is feasible, and if so, how to modify the code to realize it. Please help. Thx.

cheniison commented 3 years ago

I have the same problem. Actually, a simple and direct way is to enable brat to wrap text based on any character, once the length exceeds the visualization width. I don't know whether it is feasible, and if so, how to modify the code to realize it. Please help. Thx.

According to your idea, I modify the code at https://github.com/nlplab/brat/blob/master/server/src/tokenise.py#L46. Change code tokens = text.split() to tokens = list(text), and the problem is solved.

404area commented 3 years ago

我也有同样的问题。其实,一个简单直接的方法就是让brat根据任何字符来换行,一旦长度超过可视化宽度。不知道是否可行,如果可行,如何修改代码实现。请帮忙。谢谢。

根据你的想法,我修改了https://github.com/nlplab/brat/blob/master/server/src/tokenise.py#L46 上的代码。 将代码tokens = text.split()改成tokens = list(text),问题解决。

This will result in no space segmentation in the English part of the mixed Chinese and English text, How to solve this problem?