miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.46k stars 525 forks source link

Add support for the Greek Language #169

Closed NC0DER closed 2 years ago

NC0DER commented 2 years ago

Greetings,

As mentioned in the #167 issue I opened earlier, I would like to add support for the Greek Language. Regarding the license issue, I included a license of LGPL-v3 which covers the usage of the Greek Stemmer module in the stem_word() function inside sumy/nlp/stemmers/greek.py. This covers the case of considering it a derivative work, since it is a wrapper function built around the Greek Stemmer module. This module was also added in the extras_require section in setup.py. I wrote the code for the Greek Sentence and word Tokenizer, and tested it locally using Greek text. I also added two simple test cases in sumy/nlp/stemmers/greek.py. Regarding the extra language specific abbreviations, I included the most common ones I found, and remove the final point . from each one as mentioned in your code comment. Finally I added a list of Greek stopwords.

If you got any feedback/changes, please let me know.

Thank you for your time!

NC0DER commented 2 years ago

Greetings,

Could we do something about this failing test, in order for the pull request to resume? Do you require any further assistance?

Thank you for your time!

miso-belica commented 2 years ago

Hi, sorry, I don't think the test is a problem. It is irrelevant. Just I am quite busy with life currently. I hope I'll find time during next weeks for this.

NC0DER commented 2 years ago

Good evening,

I added most of the suggested changes of your review. If you got any more changes please let me know.

Kind regards

NC0DER commented 2 years ago

Thank you so much for your valuable feedback and the integration of this code. 👍