swcarpentry / shell-novice

The Unix Shell
http://swcarpentry.github.io/shell-novice/
Other
373 stars 969 forks source link

A detail in Unix Shell Pipes and Filters Lesson #1439

Closed trevdoesdev closed 8 months ago

trevdoesdev commented 8 months ago

How could the content be improved?

In the pipes and filters lesson, I found it very confusing why "uniq" didn't condense the duplicates in a file unless used with "sort". It would be nice for new learners to add a note like this:

"The uniq command is designed to work with sorted input data by default. It's intended to remove consecutive duplicate lines, meaning that it considers duplicate lines only if they are adjacent to each other in the input. This design is based on the assumption that uniq is often used in conjunction with sort to process data.

Which part of the content does your suggestion apply to?

https://swcarpentry.github.io/shell-novice/04-pipefilter.html

bkmgit commented 8 months ago

Thanks for raising this issue. Might you be able to make a pull request to improve the explanation?

gcapes commented 8 months ago

I think the uniq command is first introduced in the pipe contruction exercise like this:

The uniq command filters out adjacent matching lines in a file.

Perhaps changing 'adjacent' to 'consecutive' would suffice? I'm concerned that the suggestion above adds an unnecessary length of explanation, as well as giving the answer to the exercise (use sort). As always, the lesson is rather full of material already and I think we need to keep that in mind.

trevdoesdev commented 8 months ago

Perhaps changing 'adjacent' to 'consecutive' would suffice? I'm concerned that the suggestion above adds an unnecessary length of explanation, as well as giving the answer to the exercise (use sort). As always, the lesson is rather full of material already and I think we need to keep that in mind.

My apologies. I guess that first statement was clear enough, I just missed it on my first read. I think it would've been more clear if it had appended:

uniq identifies lines as duplicates only if they are next to each other in the file.