Open liutianling opened 4 years ago
Yes and no. It needs segmentation, but not like Chinese. It is slightly different that Chinese. Here, most words are separated by space, but many words are glued together and combined into larger words. So there are some spaces present and some absent.
@zeeshansayyed Thanks! I want to get the embedding of the arabic?If you have any suggestion about the corpus should be separated just by space or other processing? Thanks.
There's no single answer to this. Most off-the-shelf Arabic embeddings out there simply use the corpus as is i.e. with the natural spaces which are present in the corpus. People then use a segmenter as a part of the NLP pipeline before performing anything. But it would be interesting to have embeddings of the segmented corpus.
Thank you very much! I will try some methods. Thanks.
Thanks for you sharing... I want to know if arabic needs segmentation like chinese? I mean if when doing nlp task with arabic, split it to words is needed? Thanks!