Open sorawee opened 4 years ago
I agree about "filter" being confusing terminology. I've always liked the name "choose" for this. It reflects the fact that you're interested in the result, not what is filtered out. https://opendylan.org/books/drm/Collection_Operations#choose
"select" has a different connotation already in many languages, so personally I would avoid that.
I don't see a need for both "keep" and "drop" variants. "keep" with a negated predicate works well and one is generally interested in the result (the kept elements) rather than the droppings, so to speak.
[edit] I am an "old programmer" and I've always found "filter" to be ambiguous.
No.
I don't mean to be rude, but I cannot abide the dismissal of decades of culture as the arbitrary preferences of "old programmers."
A filter has two actions: it eliminates things while letting others pass through. Everybody knows what a filter does. Debating which action "makes more sense" is a waste of time.
The present connotation for filter
already exists in many languages. There's a pragmatic reason for it, and It's probably far less technical than you think.
Den man. 9. dec. 2019 kl. 07.53 skrev sorawee notifications@github.com:
This has been bugging me since I was introduced to higher-order functions. To filter, as I understand, is to remove things. Therefore, the provided lambda should be used to find elements to remove, not ones to keep. I remember this confused me a lot when I started using Racket.
I agree with Eric, that "filter" is standard name used everywhere.
But maybe it helps that there exists two types of filters in the real world. Some filters are used to filter out unwanted "stuff" (the grounds are collected in a coffee filter). Other filters are used keep the good stuff (a low pass filter in audio keeps the low frequencies).
So ... switch your real world association of "filter" to am example of the other type of filter.
/Jens Axel
I agree with Eric, that "filter" is standard name used everywhere.
As someone who thumbed up the original post, I was going to point out that the tidyverse package in R uses filter
to keep rows in data frames that match - and that the language used around the Pandas library in python uses filter similarly, although the command is query
not filter. See here.
Then I realized that this is exactly your and Eric's point: filter
is used to keep, not remove the things that match in those languages. So yeah, it seems that there are quite a few languages that use filter that way, and they are rather more common than any Lisp. I nonetheless agree that filter is more ambiguous than other terms - even though in Lisps it seems the common thing. So I definitely wouldn't switch its meaning to remove things, but it might be good to have a more unambiguous description, especially if there is a good precedent elsewhere. Otherwise I'd stick with it, since it would probably lead to even more confusion in the other direction.
Out of curiosity, I made a table of what names different languages use for filtering functionality. I'm quite surprised that Racket is the only language I could find that actually has two separate filter functions in the standard library (filter
and filter-not
) for the two different ways a predicate could be handled. Every other language just keeps elements when the predicate returns true and relies on some sort of predicate negation feature for the other case.
Language | Keep when predicate returns true |
Keep when predicate returns false |
---|---|---|
Java | filter |
(none) |
C# | where |
(none) |
Haskell | filter |
(none) |
JavaScript | filter |
(none) |
Python | filter |
(none) |
Ruby | select |
(none) |
Racket | filter |
filter-not |
C++ | std::copy_if (wtf?) |
(none) |
Rust | filter |
(none) |
Suggested additions to this table are welcome.
I like the list. I looked around for relevant things to add and managed to at least triple the size, so I think it's easiest to present it as a new table.
On top of that, I've included the names of mutating operations. This fills some gaps. For instance, Java and C# have (pure) "find all true" and (mutating) "remove! if true" operations, but no "remove! if false" or "find all false" operations that I could find.
Language | Find all true / Remove! if false | Find all false / Remove! if true |
---|---|---|
Java | Stream#filter / ? |
? / Collection#removeIf |
C#, Visual Basic .NET | List#FindAll , Queryable#Where , Enumerable#Where / ? |
? / List#RemoveAll |
Haskell, Elm | filter / ? |
? / ? |
JavaScript | Array#filter / ? |
? / ? |
Python | filter , list comprehension if / ? |
filterfalse / ? |
Ruby Enumerable#... methods |
#filter , #find_all , #select / ? |
#reject / ? |
Ruby Array#... methods [note 1] |
#select aka #filter / #select! aka #filter! , #keep_if |
? / #reject! , #delete_if |
Scheme SRFI-1 [note 2], Clojure | filter / ? |
remove / ? |
R6RS Scheme | filter / ? |
remp / ? |
#lang racket |
filter / ? |
filter-not / ? |
C++ [note 3] | std::copy_if / ? |
std::remove_copy_if / std::remove_if |
Rust | Iterator#filter / ? |
? / ? |
Groovy [note 4] | List#findAll , Collection#grep / Collection#retainAll |
? / Collection#removeAll |
PHP | array_filter / ? |
? / ? |
Swift | Sequence#filter , NSArray#filtered / NSArray#filter |
? / RangeReplaceableCollection#removeAll |
Objective-C Cocoa | NSArray#(filteredArrayUsingPredicate:) / NSArray#(filterUsingPredicate:) |
? / ? |
[note 1] Ruby Array
s can also access the Enumerable
methods. Due to the large number of relevant Ruby methods, they're sorted into two table rows and abbreviated. Some Array
methods are documented as being mere aliases of the others, so they're listed with "[normalized name] aka [alias]."
[note 2] Scheme SRFI-1 also has filter!
and remove!
. They aren't typical mutable collection "remove! if false/true" operations, but they can provide a similar experience if the list is stored in a mutable variable.
[note 3] I'm not familiar enough with C++ iterators to say to what extent those operations may be said to mutate the given collections. I've made a guess that the "copy
" ones are in the spirit of creating a new collection and the other one is in the spirit of mutating an existing one, but maybe this isn't the whole story.
[note 4] Groovy can easily access most Java methods. The Groovy row lists methods that are unique to Groovy.
Markdown code for the table:
Language | Find all true / Remove! if false | Find all false / Remove! if true
-------- | -------------------------------- | --------------------------------
Java | `Stream#filter` / ? | ? / `Collection#removeIf`
C#, Visual Basic .NET | `List#FindAll`, `Queryable#Where`, `Enumerable#Where` / ? | ? / `List#RemoveAll`
Haskell, Elm | `filter` / ? | ? / ?
JavaScript | `Array#filter` / ? | ? / ?
Python | `filter`, list comprehension `if` / ? | `filterfalse` / ?
Ruby `Enumerable#...` methods | `#filter`, `#find_all`, `#select` / ? | `#reject` / ?
Ruby `Array#...` methods [note 1] | `#select` aka `#filter` / `#select!` aka `#filter!`, `#keep_if` | ? / `#reject!`, `#delete_if`
Scheme SRFI-1 [note 2], Clojure | `filter` / ? | `remove` / ?
R6RS Scheme | `filter` / ? | `remp` / ?
`#lang racket` | `filter` / ? | `filter-not` / ?
C++ [note 3] | `std::copy_if` / ? | `std::remove_copy_if` / `std::remove_if`
Rust | `Iterator#filter` / ? | ? / ?
Groovy [note 4] | `List#findAll`, `Collection#grep` / `Collection#retainAll` | ? / `Collection#removeAll`
PHP | `array_filter` / ? | ? / ?
Swift | `Sequence#filter`, `NSArray#filtered` / `NSArray#filter` | ? / `RangeReplaceableCollection#removeAll`
Objective-C Cocoa | `NSArray#(filteredArrayUsingPredicate:)` / `NSArray#(filterUsingPredicate:)` | ? / ?
I don't expect to convince anyone, but here're more anecdotes re: the confusion:
when will I learn that List.filter keeps everything for which the predicate returns true?
To be honest I get confused by filter. I, for some stupid reason assume it will remove the elements it matches.
I share your opinion. After I started using an R library called
purrr
, I found alternatives likekeep
andremove
intuitive in comparison tofilter
.
Ruby only added Enumerable#filter
recently (2 years ago). Here's a discussion thread, and there are people raising points on the "positivity" and "negativity" of the word "filter", too. Ruby eventually adds Enumerable#filter
as an alias of Enumerable #select
.
[September 7, 2021] added https://twitter.com/GabrielG439/status/1435385232643944454
I don't mean to be rude, but I cannot abide the dismissal of decades of culture as the arbitrary preferences of "old programmers."
I'm not sure if my use of "old programmers" has offended anyone. If that's the case, I really apologize. When I said "old programmers", I meant people who have a lot of experience with programming before, not programmers who are old. (English is not my native language; I'm not saying this to take no responsibility of my words. That was entirely my fault. But I really want you to know that I didn't have bad intention when I used that phrase).
That being said, I disagree with the framing of this proposal as "the dismissal of decades of culture". Was Racket's early change to make cons
immutable a "dismissal of decades of culture"? Is the whole idea of rhombus-brainstorming a "dismissal of decades of culture"? I didn't propose the change for the sake of changing it. I was personally confused by it, and other people (even people who speak English as a native language) are confused by it.
I now understand that "filter" doesn't mean remove unwanted things (thanks to both @dedbox and @soegaard for the clarification), but as @cgay said, it would be even better to have a word that can only be interpreted in one way, and has a precise meaning that matches what it's supposed to do.
The default directionality of words is likely to always be problematic, filter
could be filter-in
or filter-out
, #t
translating to inclusion by default seems to match masking operations that happen at varying levels. The fact that there is variance in how people perceive the default case for filter is something that can't be fixed by a language. What a language can do is clearly specify what the default behavior is and then never change it. This is fundamentally different than cons
because of the binary nature of the change.
Also, common uses of the word filter in english default to filter-in
:
air filter
lets the air through while removing dust other other particlescoffee filter
lets the coffee through while blocking the groundswater filter
allows water through, removing particulate matter, etc.
When filter is used alone, as in "please filter the X" the meaning is for X to be retained, e.g. the specification of what passes through the filter is what the filter is named, this is a filter-in
default. This is in all likelihood because there are countless things that could be filtered out -- the only thing we know for sure is what passes through. Furthermore, in regular usage, out
is appended explicitly whenever the filter-out
meaning is needed, for example "we filtered out the poor candidates by making them run laps until they were too tired to continue" or "please filter out the garbage from this list." Note that "please filter the garbage" is ambiguous only if one ignores the default meaning of filter in english, in this case the sentence only tells us to keep the garbage, which as a side effect may have filtered the water that it was in, but the context is never specified, and most importantly it does not have to be.It is true that filters themselves remove things, but the specification of the processes of filtering is nearly always in terms of what is to be included in the output.
All this is a long way of saying that the meaning of the word filter is not ambiguous in English, and it matches the default behaviour that Racket already has.
We usually call filter
. So we need to look at how it's usually used as a verb, and how its connotations will combine with the connotations of the arguments.
Let's first observe what we do naturally in a clearer situation. What's an explicit reading of (if (fluffy? bunny) (pet bunny) (wash bunny))
, if you wrote it in a comment or read it to a non-coder. Rules of english force some reorderings, and there are two obvious least-surprise words we expect. Together that's almost enough to mechanically produce “if the bunny is fluffy then pet the bunny, otherwise wash the bunny”.
Reading code is very much like reading a text message, or a telegraph (if you're old), or listening to noisy audio.
Now let's try filter
. What's an explicit reading of (filter fluffy? bunnies)
?
Is it as close syntactically in the space of sentences as “filter out (the) fluffy bunnies”.
Like the expected “then” and “otherwise“, if you read “filter” as a verb you often expect the word “out” or words “out of” somewhere.
My explanation is, I think, based on some standard cognitive science and linguistics. But I don't know those areas formally, so would love it if a cognitive scientist or linguist weighed in here.
Out of curiosity, I made a table of what names different languages use for filtering functionality. I'm quite surprised that Racket is the only language I could find that actually has two separate filter functions in the standard library (
filter
andfilter-not
) for the two different ways a predicate could be handled. Suggested additions to this table are welcome.
Elixir has both Enum.filter/2
and Enum.reject/2
, the latter being certainly more clear than filter-not
.
Regarding the general topic here, given that the ambiguity of filter comes from the fact that the common use of filter means to keep some things and reject others, I actually think that replacing filter
with both keep
and reject
makes things very clear. The idea that filter
is purportedly embedded in culture enough to not rethink its name is shortsighted, in my opinion. The first thing someone is going to need to do in a new language is look up the list or enum module documentation to understand the function signature of filter
anyway, or they may be greeted with a CLI help tool or IDE intellisense. If it isn't there and is instead replaced with keep
and reject
, for example, it is a one time event that will come and go rather unceremoniously. The user may even be pleasantly surprised that the language favors removing ambiguity.
From the Racket Discord:
SamPh — Today at 8:01 AM Yeah I always think of filter as removing
badkins — Today at 8:01 AM Funny you should mention that. Using the word "filter" in a conversation with my client was a source of confusion! As in, are we "filtering out" or "filtering in" ? :)
SamPh — Today at 8:01 AM select / reject
badkins — Today at 8:03 AM I probably agree, if starting from scratch, but the meaning of "filter" in functional programming is now pretty ingrained in me. I feel more strongly about argument ordering. For example, I prefer (string-split sep str) because you're more likely to want to (curry string-split "\n") That's the type of change I would love to be able to make in "Racket 2" :) Maybe we can think of "filter" more as a fishing net than a coffee filter? :)
SamPh — Today at 8:12 AM The problem with "coffee filter" is that both what it prevents from passing through and what it passes through are both called coffee.
badkins — Today at 8:14 AM No doubt there is some ambiguity, but I suppose most people use a filter primarily for what makes it through i.e. air, water, coffee, oil, etc. :) Hard to argue with the clarity of the pair select / reject though. Anyway, let's focus first on proper argument ordering :)
Rhombus has evolved to use "keep" and "skip" as iteration words, as in keep_when
and skip_when
clauses for for
, as well as ~keep
and ~skip
arguments for filesystem.files
. So, I think we're going to try
(lst :: List).filter(~keep: keep :: (Any -> Any.to_boolean) = fun (x): #true,
~skip: skip :: (Any -> Any.to_boolean) = fun (x): #false)
This way, each use of filter
will specify explicitly which mode (or both) it means, as in the suggestion to name separate functions keep
and reject
— but we still get to keep the common name filter
. If someone forgets to use (or has not yet learned to use) ~keep
, the error message will make it clear.
This has been bugging me since I was introduced to higher-order functions. To filter, as I understand, is to remove things. Therefore, the provided lambda should be used to find elements to remove, not ones to keep. I remember this confused me a lot when I started using Racket.
There are several names that could be used for the "keep" variant:
And ones for "filter" variant:
Note that
drop
unfortunately is already occupied, and assigning a different behavior tofilter
is probably going to confuse old programmers.