Open vladimir-shmidt opened 10 years ago
suppose changes in class StopWords(object): self._cached_stop_words[language] = set(FileHelper.loadResourceFile(path).encode('utf-8').splitlines()) will solve the issue
I supposed you're stopword file is not correctly encoded
i haven't changed anything with it.
Have tried to extract russian article but gosse produced empty result. I tried to debug and have found out that extracted content (text from p tag) can not be found in loaded stop list. But it is 100% in the stop list. So i suppose it is the string eqauls problem in python or something fimilar. In the right bottom coner i've added watch items. So it is currnet word. Eqauls result of set and stop word position of current word.