jinhangjiang / morethansentiments

Python package to calculate Boilerplate and many other text quantified features
BSD 3-Clause "New" or "Revised" License
21 stars 3 forks source link

The ngram size parameter (n) in redundancy calculation seems not used #2

Open rordi opened 11 months ago

rordi commented 11 months ago

Thank your for the wonderful work provided in the MoreThanSentiments package.

The function parameter n of the Redundancy method seems not used towards the ngram size, as the ngram size is hardcoded to 10 in the following line:

https://github.com/jinhangjiang/morethansentiments/blob/ebb2837538c1e4947624b5aa1f3bc2f2f6dccb38/src/MoreThanSentiments.py#L203

As the original publication (Cazier and Pfeiffer, 2015) was based on rather long documents, the 10-gram was probably ok in that context. When dealing with shorter documents, it would be useful for users to be able to work with a smaller ngram size.

jinhangjiang commented 11 months ago

@rordi you are right. We will fix it soon. Thank you for pointing that out! Will get back to you on this thread once the bug is fixed.