Open AndreyKondakovGW opened 3 years ago
Exaple of this function: (on their basis you can write test fo this function)
I take it
Also sugest add lemmatization option to lemmatize tokens, as independant function or as part of tokenize function tokenize(text="Классификация настроения текста из базы , to_lower = true, min_token_size=4, lemmatize = True) => ["классификация", "настроение", "текст", "база"]
Looks like there is no ready to use library for lemmatization written in ruby, so we will focus on tokenzation in this issue and extract lemmatization as a separate issue.
Find/Create function for sepatating text string to tokens(words). Function must get text string and return list of string tokens. Function also should not return tokens containing digits and punctuation (you can do this with regular expressions). Function must have optional parameters: