simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.11k stars 187 forks source link

Remove scan_base16_lg #246

Open jonstewart opened 3 years ago

jonstewart commented 3 years ago

There's no compelling reason to bring scan_base16_lg forward to the 2.0 API:

  1. Hex scanning is disabled by default.
  2. scan_base16.flex exists as a fallback, and it's a relatively simple flex-based scanner with just a single pattern (i.e., not confounding behavior due to other patterns).
  3. The base16 regexp in scan_base16_lg will slow down other scanners and increase NFA size with determinization ([0-9a-fA-F]{6,} is likely to cause splits with other states and make it less likely to filter out impossible prefixes based on two-byte ngram filter).
  4. There's not much for encoding concerns.

scan_base16_lg could be an improvement on scan_base16.flex performance-wise, but I feel it's unlikely to be a big win. It could also be that scan_base16_lg has worse performance.

With your approval, @simsong, I will delete it.

simsong commented 3 years ago

I concur.