jakartaresearch / maleo

Wrapper library for text cleansing, preprocessing in NLP
https://jakartaresearch.github.io/maleo/
MIT License
17 stars 0 forks source link

emoticon to word doesn't add whitespace #28

Closed andreaschandra closed 3 years ago

andreaschandra commented 3 years ago

when using emoji_to_word the converted emoji has no whitespace to separate each emoji.

input

"untuk besok komen disini ya.. 😊😊😊😊🙏🙏🙏 https://t.cowok/nxeojVug3z "

output

utk besok komen disini ya.. "smiling_face_with_smiling_eyessmiling_face_with_smiling_eyessmiling_face_with_smiling_eyessmiling_face_with_smiling_eyesfolded_handsfolded_handsfolded_hands https://t.co/nxeojVug3z"

expected output

"smiling_face_with_smiling_eyes smiling_face_with_smiling_eyes smiling_face_with_smiling_eyes smiling_face_with_smiling_eyes folded_hands folded_hands folded_hands https://t.co/nxeojVug3z"

So that, we can tokenize the emoji word

rubentea16 commented 3 years ago

DONE https://github.com/jakartaresearch/maleo/commit/085027a365fd5eabbbe9570fa1cdccde47141231