timbray / topfew

Finds the field values (or combinations of values) which appear most often in a stream of records.
GNU General Public License v3.0
186 stars 6 forks source link

Request for a field-separator argument #20

Closed peterjanes closed 4 months ago

peterjanes commented 5 months ago

Per #14, "if someone later wants a field-separator argument, make an issue for that."

My particular use case is dealing with large NDJSON streams, where it would be useful to be able to set a field-separator of ,. (There might be cases where a , is found within a key or value, but I don't know if multi-character delimiters is a legit request.)

timbray commented 5 months ago

I think it'd be a dead easy PR, use Regexp.Split, and then some changes to config.c and runner.c for the new argument. But, the performance is going to take a pretty severe hit compared to the brute-force state machine in keyfinder.go - that should be measured and documented.

timbray commented 4 months ago

Addressed in #23