Occasionally Flesch (FRE) readability scores are negative

russelljjarvis commented 3 years ago

Occasionally Flesch readability scores are negative, I see this in two places:

In the csv as a dataframe: https://github.com/wiheto/readabilityinscience/blob/master/data/fulltexts/stats/Figure4_SourceData1.csv

Specifically in the data frame column: "flesch_fulltexts",

and when I try to apply https://github.com/wiheto/readabilityinscience/blob/master/functions/readabilityFunctions.py#L206 to some web docs and pdfs.

Any insight into what causes negative or unusually high scores (FRE>60)? I am sure these are caused by digital artifacts and not valid readability scores.

wiheto commented 3 years ago

Negative scores are absolutely possible given the equation. Taken from Wikipedia (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests)

Reader's Digest magazine has a readability index of about 65, Time magazine scores about 52, an average grade six student's written assignment (age of 12) has a readability index of 60–70 (and a reading grade level of six to seven), and the Harvard Law Review has a general readability score in the low 30s. The highest (easiest) readability score possible is 121.22, but only if every sentence consists of only one one-syllable word. "The cat sat on the mat." scores 116. The score does not have a theoretical lower bound; therefore, it is possible to make the score as low as wanted by arbitrarily including words with many syllables. The sentence "This sentence, taken as a reading passage unto itself, is being used to prove a point." has a readability of 69. The sentence "The Australian platypus is seemingly a hybrid of a mammal and reptilian creature." scores 37.5 as it has 24 syllables and 13 words. While Amazon calculates the text of Moby Dick as 57.9,[9] one particularly long sentence about sharks in chapter 64 has a readability score of −146.77.[10] One sentence in the beginning of Swann's Way, by Marcel Proust, has a score of −515.1.[11]

From my memory, the higher scores in our data would usually be for very short abstracts in our analyses when the abstract was very simple.

If you want to investigate any specific instance in SourceData1, you can also search the pmid in pubmed to get the article.

russelljjarvis commented 3 years ago

"From my memory, the higher scores in our data would usually be for very short abstracts in our analyses when the abstract was very simple."

Yes, I am seeing this too. Doesn't this invalidate the FRE score then? Shouldn't FRE be low for short simple writing? Or is this the usecase for NDC (because FRE is known to be invalid for very short passages, you need NDC instead)?

wiheto commented 3 years ago

I do not think it invalidates the FRE score. They will also be a low NDC score.

That abstracts themselves can vary is just an additional source of variance. There are multiple different types of abstracts out there: structured abstracts, longer free text abstracts, and shorter 1-2 sentence abstracts. Short abstract does not necessarily mean a high FRE, but if the abstract is just saying "In this review, we discuss addiction." Then it will get a high FRE score (and also low NDC scores). But you could also write a really complex one sentence, they are just less likely in scientific abstracts.

But perhaps this reveals an important property about these readability measures (which we mention in the article) they are just indicators of readability, i.e. they are just estimates. They are not 100% certain of the readability.

You can make additional inclusion criteria that we had about abstracts, if you want to make your data less heterogeneous.

russelljjarvis commented 3 years ago

@mcgurrgurr

wiheto / readabilityinscience

Occasionally Flesch (FRE) readability scores are negative #3