kupolak / textstat

Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.
MIT License
31 stars 9 forks source link

Arithmetic bug in Dale Chall readability score #31

Closed abrom closed 4 years ago

abrom commented 4 years ago

It looks like there is an arithmetic bug in the Dale Chall calculations caused by some interesting code styling mixed with what I believe is a bug in the Ruby parser

Specifically with: https://github.com/kupolak/textstat/blob/master/lib/textstat.rb#L182-184

The way the Ruby parser works, this is actually assigning only the last line ie (0.0496 * avg_sentence_length(text)) to the score variable, thus ignoring the (0.1579 * difficult_words) bit. You can test this by simply putting those three lines all onto one then seeing the spec tests fail.

  1) TextStat When testing the TextStat class returns the correct Dale–Chall readability score
     Failure/Error: expect(score).to be 4.79

       expected #<Float:43144484430209354> => 4.79
            got #<Float:65302194596872194> => 7.25

In terms of a fix, suggest either move them onto one line, or move the + operator to the end of the previous line as the parser would then read it as expected..

FYI I have raised an issue with Ruby maintainers to see what the deal is with the parser issue! See https://bugs.ruby-lang.org/issues/16520

abrom commented 4 years ago

Update from Ruby maintainers is that it isn't a bug.. by putting brackets around the code like that Ruby parses it as:

score = begin
    (0.1579 * difficult_words)
    (0.0496 * avg_sentence_length(text))
end

thus the first line is ignored.

abrom commented 4 years ago

Also happy to put together a PR to address this (and a few other minor rounding related bits and pieces i've noticed).

kupolak commented 4 years ago

Pull request welcome.