thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.82k stars 352 forks source link

Add a uniq filter. #653

Closed Sveder closed 2 years ago

Sveder commented 3 years ago

@thp wdyt? I tried using shellpipe: uniq (didn't work) but makes sense to me to have this as a first class as sort exists.

(Draft as I didn't add docs, if you'll ok this I'll add them and un-draft)

thp commented 3 years ago

The reason why sort exists is historical, there was no shellpipe filter when sort was introduced.

The reason why your uniq with shellpipe didn't work is most likely because you didn't sort it before (uniq in Unix requires lines to be sorted).

So for line-based uniqueness, I think it's fine to just sort it first (try shellpipe: 'sort -u' or shellpipe: 'sort | uniq').

What might be interesting is a kind of "unique / sort" kind of filter that works on e.g. CSS or XPath selectors, as this is kind of hard to do with built-in unix commands.

Feel free to turn this PR into a documentation change that shows how to use shellpipe + sort -u or sort | uniq to filter duplicate lines.

Sveder commented 3 years ago

@thp thanks for answering. You might be right that I didn't play with shellpipe: uniq enough, and indeed now that I read uniq docs it is not what I wanted as I don't want to sort the data.

In this case, I think my implementation of uniq is definitely simpler and more intuitive than some of the awk one liners that stack overflow suggests to uniq without sort.

Is this use case still not interesting enough to be merged?

thp commented 3 years ago

Yeah I think it's valid, but rename it remove-duplicate-lines or something so that it's easier for non-Unix people and so that there's no confusion with the Unix uniq tool which works slightly differently.

Sveder commented 3 years ago

@thp updated the name as per your suggestion.

Sveder commented 3 years ago

@thp made the changes :)

Sveder commented 3 years ago

Changes made.

thp commented 3 years ago

Please mark as ready for review + update changelog + squash to a single commit and then we can merge this.

Sveder commented 2 years ago

@thp done.