Closed rjw1 closed 3 years ago
Having thought about this, I lean to declaring this a non-issue and adjusting the documentation to say the field list has to be in increasing order. Two reasons: For any permutation of the same field numbers, tf should generate the same occurrence counts and result list (if it doesn't, that'd be a bug for sure), so it's not obvious what the benefit of doing this is. Second, the most important feature of tf is that it's fast, and since the field extraction has to be done on every record, it's on the performance critical path. At the moment, the field extraction is highly optimized and relies on being able to work through the record accepting the fields there in the -f and stopping when it gets to a big enough number that there are no more to come. Adding a step to shuffle the field strings around might be cheap but would add up since you have to do it for every line.
On the other hand, if there's an interesting use case that would be enabled by permuting the field list, I'd be happy to hear about it.
If the ordering of the field list matters then maybe topfew
could sort that itself before executing. This is seemingly what topfew-rs
is doing. (This is like when ecommerce sites get upset if you add spaces to a credit card number. It should just strip the spaces out and carry on with taking payment).
Once the extraction of the fields and any computation is done could topfew
then display the fields in the order the user asked for or is it that also tied into the optimized extraction.
I could of course just pipe the results into awk
to get them displayed in the order that I want.
Ah, OK, so you could say -f 5,3
but you'd still the third then fifth fields in the output. You're right, that wouldn't hurt performance, but feels like sort of surprising/counterintuitive behavior.
Yeah, it would certainly be suprising but at least returns the the data I asked for. Erroring and saying that the fields should be in ascending order would be okay.
OK, will do that. BTW, what's your use case?
I normally want to see the consequences of a incident so the data I want to see first is the http response code and then the other info afterwards. Most log files don't put the response code earlier.
If you dont list your fields in ascending numerical order it displays the next field instead of the one you asked for.
When in fact I would expect
./bin/tf -f 9,7 test/data/small
to behave likeawk '{print $9 " " $7}' test/data/small | sort | uniq -c | sort -rn | head