superwise-ai / elemeta

Metafeature Extraction for Unstructured Data
https://docs.elemeta.ai/
MIT License
100 stars 14 forks source link

TextComplexity value should not exceed 100 #37

Closed gatha-censius closed 1 year ago

gatha-censius commented 1 year ago

TextComplexity is the Flesch Reading Ease Score of the text. Flesch ease score range is 0 to 100 only. For dataset like huggingface vargha/liquidmarket_chatbot, TextComplexity for 'completion' was seen as 121.22

BigicecreamTaken commented 1 year ago

Hi, thank you for opening the issue The score value we are getting is Flesch reading-ease score

In most of the tables on the internet, the values are in a range of 0 to 100, but in reality, it is not valid, as explained here: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests

The highest (easiest) readability score possible is 121.22, but only if every sentence consists of only one one-syllable word. "The cat sat on the mat." scores 116. The score does not have a theoretical lower bound; therefore, it is possible to make the score as low as wanted by arbitrarily including words with many syllables

gatha-censius commented 1 year ago

Thanks for the clarification. Closing this issue.