oduwsdl / sumgram

sumgram is a tool that summarizes a collection of text documents by generating the most frequent sumgrams (conjoined ngrams)
MIT License
55 stars 14 forks source link

Load stopwords from a file #29

Open ibnesayeed opened 2 years ago

ibnesayeed commented 2 years ago

Supplying a long list of stopwords as an inline CLI argument can be tricky or even impossible. having an option to load it from a file would be great.

ibnesayeed commented 2 years ago

It looks like the default list of stopwords is hard-coded in a Python file. Perhaps it will be more flexible to extract it out as a text file and iterate over it during the loading process. This way, a custom list of stopwords can be provided the same way as the default one.

https://github.com/oduwsdl/sumgram/blob/31434eaf21dfa3928f1b9e9b90c2f63e72416ed5/sumgram/util.py#L35-L326