timbray / topfew

Finds the field values (or combinations of values) which appear most often in a stream of records.
GNU General Public License v3.0
188 stars 6 forks source link

Use byte versions of regex calls #2

Closed superfell closed 3 years ago

superfell commented 3 years ago

This changes the regex code to use the regex methods that take bytes instead of strings, saving the string allocation.

I don't have a test file as large as yours, but for mine this reliably dropped runtime from ~1.5 seconds to ~1.1 seconds when using regexes.