jpeddicord / askalono

A tool & library to detect open source licenses from texts
Apache License 2.0
255 stars 25 forks source link

Preproc improvements #48

Closed AnthonyMikh closed 4 years ago

AnthonyMikh commented 4 years ago

Description of changes: This series of commits simplifies implementations of lcs_substr and remove_common_tokens as well as greatly reducing amount of memory allocations. Functionality is preserved, all tests pass.

Note: in lcs_substr the common prefix is .trim()ed. However, if this common prefix happens to start with whitespace characters, this will give wrong results later in remove_common_tokens. It seems like it actually should be .trim_end().

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

AnthonyMikh commented 4 years ago

Test failure on Travis seems to be unrelated to this changes.

EDIT: It turned out to be not true

jpeddicord commented 4 years ago

Merged! Thanks again. I'll get things ready for a new release soon, likely right after the holidays.