jpeddicord / askalono

A tool & library to detect open source licenses from texts
Apache License 2.0
256 stars 25 forks source link

"Normalize" preprocessor shouldn't mess with lines #14

Closed jpeddicord closed 6 years ago

jpeddicord commented 6 years ago

preproc.rs has PREPROC_NORMALIZE, a list of functions run (in order) to, well, normalize a string. normalize_vertical_whitespace is in there, and that gets rid of newlines in some places. That should be moved to PREPROC_AGGRESSIVE; this will help pave the way for some new features that depend on consistent line numbering. Said features will depend on the 'normalized' text lines looking roughly similar to the original input.