Closed pfmoore closed 3 years ago
Hi!
Sorry, haven't had time to look at this in a while. In this case though, there's really no need to add this. You can do the following:
cat file.txt | runiq | wc -l
Sorry for the delay, but your suggestion displays the number of unique values, whereas my proposal displays how many times each unique value occurs.
>cat file.txt
a
b
c
a
c
c
>cat file.txt | runiq | wc -l
5
>cat file.txt | runiq -c
2 a
1 b
3 c
So there really isn't a way to get the -c
functionality my PR provides with existing commands 🙁
@whitfin Given that your suggested approach doesn't do what I want, could you comment again on the request?
@pfmoore ah, misread.
What you want isn't really viable, because it requires that all values are stored against counts in memory - while this might be nice for very small inputs, it will explode for large inputs (which defeats the point of why runiq
exists).
I can think to see if there's another way, but on the face of it it's not going to be possible.
That's a fair point. If we want to write counts, then we definitely do have to keep all of the lines we'll be writing out until the end, simply because we have no way of knowing until we've read all of the input that we won't see another copy of a line we have stored, so we can't start writing anything until the end.
My PR keeps that list of output in a separate data structure, mainly because I didn't know enough about the data structures you were using in the filter module to try including the count information there. It's enough for the size of data I typically use uniq -c
on, and I personally feel that it would be enough to note that the -c
option needs enough memory to hold all of the output lines, and let the program fail if the user doesn't heed that warning. But I'm OK if you prefer to take the view that keeping memory usage bounded is more important.
My main use case is as a replacement for sort FILE | uniq -c
, using a native Windows build, rather than ports of Unix utilities (which typically don't handle Unicode properly on Windows), so I suspect I'm not really in the main target audience for this program. So if my use case doesn't fit the main focus of the code, that's fine.
Thanks for reconsidering my request anyway.
Would it be possible to add a
-c
flag to output a count of each unique line, likeuniq
has? A significant proportion of my usage ofuniq
is in the form ofsort | uniq -c | sort -n
, and being able to useruniq
to replace that initial pair of commands would be really nice.