kanishka-linux / reminiscence

Self-Hosted Bookmark And Archive Manager
GNU Affero General Public License v3.0
1.78k stars 86 forks source link

Support keywords extraction for other current languages #22

Open stephane-martin opened 5 years ago

stephane-martin commented 5 years ago

Hello,

currently it seems that in the keywords extraction process, stop words are hard coded to be for English language. Thus, when archiving content in some other language, the selected keywords are very often stop words in that language (I mainly archive content in French...)

Maybe the list of stop words could be selected dynamically, based on automatic language detection ? (see https://github.com/Mimino666/langdetect for example)

Thanks for great product :)

kanishka-linux commented 5 years ago

Yes, currently only english language is supported. I'll try to look into supporting other languages as well.