openedx-unsupported / ease

EASE (Enhanced AI Scoring Engine) is a library that allows for machine learning based classification of textual content. This is useful for tasks such as scoring student essays.
GNU Affero General Public License v3.0
216 stars 96 forks source link

Use a set instead of a list for good ngram lookup #59

Closed wedaly closed 10 years ago

wedaly commented 10 years ago

When finding grammar errors, EASE checks whether each ngram in the submission is in a list called good_pos_ngrams. This is an O(n) operation that occurs for each ngram in the submission.

By changing the list to a set, the in operation is O(1) in the average case, which results in a pretty dramatic speedup. When I profiled the algorithm on my laptop, I saw an improvement of about 71%.

@stephensanchez Please review. I'd like to get this in and re-run the perf test on dev.

stephensanchez commented 10 years ago

:+1: Definitely,