plasma-umass / DataDebug

Excel 2010/2013 add-in that automatically finds errors in spreadsheets
http://checkcell.org
GNU General Public License v2.0
48 stars 6 forks source link

Verify that the generated errors match our classification #40

Closed dgochev closed 10 years ago

dgochev commented 10 years ago

Make sure that the error we get with the modified error generator still match up with the frequencies in the classification.

dbarowy commented 10 years ago

Fixes:

There was an off-by one error, so the likelihood for the error was always being used to choose an error to the right of the correct location.

Also, when errors were supposed to be guaranteed to happen, I was conditioning the distribution wrongly. The compiler did not check the error because the conditioning was using string comparison as a filtering criteria, and I was comparing against the wrong type (an OptChar), but the compiler silently converted the OptChar to a string using ToString(). So this meant that the original high-probability option was always available.

I also changed one of the probability calculations so that transpositions are impossible (Pr[no transposition = 1.0]) when a string is of length 1.

Lastly, the conversion of a string to an OptChar was incorrect when the string was null (Excel returns null strings for empty cells).

We should still do a sanity check by reclassifying generated strings and ensuring that it matches the typo distribution, but the obvious errors are now fixed.

dgochev commented 10 years ago

I added some code for verifying that the distribution of generated errors aligns with the distribution in our classification file. I'll test it today.

dgochev commented 10 years ago

Errors appear to follow the distributions in the classification.