racket / rhombus

Rhombus programming language
Other
353 stars 62 forks source link

rename filter to something that makes more sense #131

Open sorawee opened 4 years ago

sorawee commented 4 years ago

This has been bugging me since I was introduced to higher-order functions. To filter, as I understand, is to remove things. Therefore, the provided lambda should be used to find elements to remove, not ones to keep. I remember this confused me a lot when I started using Racket.

There are several names that could be used for the "keep" variant:

And ones for "filter" variant:

Note that drop unfortunately is already occupied, and assigning a different behavior to filter is probably going to confuse old programmers.

cgay commented 4 years ago

I agree about "filter" being confusing terminology. I've always liked the name "choose" for this. It reflects the fact that you're interested in the result, not what is filtered out. https://opendylan.org/books/drm/Collection_Operations#choose

"select" has a different connotation already in many languages, so personally I would avoid that.

I don't see a need for both "keep" and "drop" variants. "keep" with a negated predicate works well and one is generally interested in the result (the kept elements) rather than the droppings, so to speak.

[edit] I am an "old programmer" and I've always found "filter" to be ambiguous.

dedbox commented 4 years ago

No.

I don't mean to be rude, but I cannot abide the dismissal of decades of culture as the arbitrary preferences of "old programmers."

A filter has two actions: it eliminates things while letting others pass through. Everybody knows what a filter does. Debating which action "makes more sense" is a waste of time.

The present connotation for filter already exists in many languages. There's a pragmatic reason for it, and It's probably far less technical than you think.

soegaard commented 4 years ago

Den man. 9. dec. 2019 kl. 07.53 skrev sorawee notifications@github.com:

This has been bugging me since I was introduced to higher-order functions. To filter, as I understand, is to remove things. Therefore, the provided lambda should be used to find elements to remove, not ones to keep. I remember this confused me a lot when I started using Racket.

I agree with Eric, that "filter" is standard name used everywhere.

But maybe it helps that there exists two types of filters in the real world. Some filters are used to filter out unwanted "stuff" (the grounds are collected in a coffee filter). Other filters are used keep the good stuff (a low pass filter in audio keeps the low frequencies).

So ... switch your real world association of "filter" to am example of the other type of filter.

/Jens Axel

MarcKaufmann commented 4 years ago

I agree with Eric, that "filter" is standard name used everywhere.

As someone who thumbed up the original post, I was going to point out that the tidyverse package in R uses filter to keep rows in data frames that match - and that the language used around the Pandas library in python uses filter similarly, although the command is query not filter. See here.

Then I realized that this is exactly your and Eric's point: filter is used to keep, not remove the things that match in those languages. So yeah, it seems that there are quite a few languages that use filter that way, and they are rather more common than any Lisp. I nonetheless agree that filter is more ambiguous than other terms - even though in Lisps it seems the common thing. So I definitely wouldn't switch its meaning to remove things, but it might be good to have a more unambiguous description, especially if there is a good precedent elsewhere. Otherwise I'd stick with it, since it would probably lead to even more confusion in the other direction.

jackfirth commented 4 years ago

Out of curiosity, I made a table of what names different languages use for filtering functionality. I'm quite surprised that Racket is the only language I could find that actually has two separate filter functions in the standard library (filter and filter-not) for the two different ways a predicate could be handled. Every other language just keeps elements when the predicate returns true and relies on some sort of predicate negation feature for the other case.

Language Keep when predicate returns true Keep when predicate returns false
Java filter (none)
C# where (none)
Haskell filter (none)
JavaScript filter (none)
Python filter (none)
Ruby select (none)
Racket filter filter-not
C++ std::copy_if (wtf?) (none)
Rust filter (none)

Suggested additions to this table are welcome.

rocketnia commented 4 years ago

I like the list. I looked around for relevant things to add and managed to at least triple the size, so I think it's easiest to present it as a new table.

On top of that, I've included the names of mutating operations. This fills some gaps. For instance, Java and C# have (pure) "find all true" and (mutating) "remove! if true" operations, but no "remove! if false" or "find all false" operations that I could find.

Language Find all true / Remove! if false Find all false / Remove! if true
Java Stream#filter / ? ? / Collection#removeIf
C#, Visual Basic .NET List#FindAll, Queryable#Where, Enumerable#Where / ? ? / List#RemoveAll
Haskell, Elm filter / ? ? / ?
JavaScript Array#filter / ? ? / ?
Python filter, list comprehension if / ? filterfalse / ?
Ruby Enumerable#... methods #filter, #find_all, #select / ? #reject / ?
Ruby Array#... methods [note 1] #select aka #filter / #select! aka #filter!, #keep_if ? / #reject!, #delete_if
Scheme SRFI-1 [note 2], Clojure filter / ? remove / ?
R6RS Scheme filter / ? remp / ?
#lang racket filter / ? filter-not / ?
C++ [note 3] std::copy_if / ? std::remove_copy_if / std::remove_if
Rust Iterator#filter / ? ? / ?
Groovy [note 4] List#findAll, Collection#grep / Collection#retainAll ? / Collection#removeAll
PHP array_filter / ? ? / ?
Swift Sequence#filter, NSArray#filtered / NSArray#filter ? / RangeReplaceableCollection#removeAll
Objective-C Cocoa NSArray#(filteredArrayUsingPredicate:) / NSArray#(filterUsingPredicate:) ? / ?

[note 1] Ruby Arrays can also access the Enumerable methods. Due to the large number of relevant Ruby methods, they're sorted into two table rows and abbreviated. Some Array methods are documented as being mere aliases of the others, so they're listed with "[normalized name] aka [alias]."

[note 2] Scheme SRFI-1 also has filter! and remove!. They aren't typical mutable collection "remove! if false/true" operations, but they can provide a similar experience if the list is stored in a mutable variable.

[note 3] I'm not familiar enough with C++ iterators to say to what extent those operations may be said to mutate the given collections. I've made a guess that the "copy" ones are in the spirit of creating a new collection and the other one is in the spirit of mutating an existing one, but maybe this isn't the whole story.

[note 4] Groovy can easily access most Java methods. The Groovy row lists methods that are unique to Groovy.

Markdown code for the table:

Language | Find all true / Remove! if false | Find all false / Remove! if true
-------- | -------------------------------- | --------------------------------
Java | `Stream#filter` / ? | ? / `Collection#removeIf`
C#, Visual Basic .NET | `List#FindAll`, `Queryable#Where`, `Enumerable#Where` / ? | ? / `List#RemoveAll`
Haskell, Elm | `filter` / ? | ? / ?
JavaScript | `Array#filter` / ? | ? / ?
Python | `filter`, list comprehension `if` / ? | `filterfalse` / ?
Ruby `Enumerable#...` methods | `#filter`, `#find_all`, `#select` / ? | `#reject` / ?
Ruby `Array#...` methods [note 1] | `#select` aka `#filter` / `#select!` aka `#filter!`, `#keep_if` | ? / `#reject!`, `#delete_if`
Scheme SRFI-1 [note 2], Clojure | `filter` / ? |  `remove` / ?
R6RS Scheme | `filter` / ? | `remp` / ?
`#lang racket` | `filter` / ? | `filter-not` / ?
C++ [note 3] | `std::copy_if` / ? | `std::remove_copy_if` / `std::remove_if`
Rust | `Iterator#filter` / ? | ? / ?
Groovy [note 4] | `List#findAll`, `Collection#grep` / `Collection#retainAll` | ? / `Collection#removeAll`
PHP | `array_filter` / ? | ? / ?
Swift | `Sequence#filter`, `NSArray#filtered` / `NSArray#filter` | ? / `RangeReplaceableCollection#removeAll`
Objective-C Cocoa | `NSArray#(filteredArrayUsingPredicate:)` / `NSArray#(filterUsingPredicate:)` | ? / ?
sorawee commented 4 years ago

I don't expect to convince anyone, but here're more anecdotes re: the confusion:

Ruby only added Enumerable#filter recently (2 years ago). Here's a discussion thread, and there are people raising points on the "positivity" and "negativity" of the word "filter", too. Ruby eventually adds Enumerable#filter as an alias of Enumerable #select.

[September 7, 2021] added https://twitter.com/GabrielG439/status/1435385232643944454

sorawee commented 4 years ago

I don't mean to be rude, but I cannot abide the dismissal of decades of culture as the arbitrary preferences of "old programmers."

I'm not sure if my use of "old programmers" has offended anyone. If that's the case, I really apologize. When I said "old programmers", I meant people who have a lot of experience with programming before, not programmers who are old. (English is not my native language; I'm not saying this to take no responsibility of my words. That was entirely my fault. But I really want you to know that I didn't have bad intention when I used that phrase).

That being said, I disagree with the framing of this proposal as "the dismissal of decades of culture". Was Racket's early change to make cons immutable a "dismissal of decades of culture"? Is the whole idea of rhombus-brainstorming a "dismissal of decades of culture"? I didn't propose the change for the sake of changing it. I was personally confused by it, and other people (even people who speak English as a native language) are confused by it.

I now understand that "filter" doesn't mean remove unwanted things (thanks to both @dedbox and @soegaard for the clarification), but as @cgay said, it would be even better to have a word that can only be interpreted in one way, and has a precise meaning that matches what it's supposed to do.

tgbugs commented 4 years ago

The default directionality of words is likely to always be problematic, filter could be filter-in or filter-out, #t translating to inclusion by default seems to match masking operations that happen at varying levels. The fact that there is variance in how people perceive the default case for filter is something that can't be fixed by a language. What a language can do is clearly specify what the default behavior is and then never change it. This is fundamentally different than cons because of the binary nature of the change.

Also, common uses of the word filter in english default to filter-in:

It is true that filters themselves remove things, but the specification of the processes of filtering is nearly always in terms of what is to be included in the output.

All this is a long way of saying that the meaning of the word filter is not ambiguous in English, and it matches the default behaviour that Racket already has.

gfbee commented 4 years ago

We usually call filter. So we need to look at how it's usually used as a verb, and how its connotations will combine with the connotations of the arguments.

Let's first observe what we do naturally in a clearer situation. What's an explicit reading of (if (fluffy? bunny) (pet bunny) (wash bunny)), if you wrote it in a comment or read it to a non-coder. Rules of english force some reorderings, and there are two obvious least-surprise words we expect. Together that's almost enough to mechanically produce “if the bunny is fluffy then pet the bunny, otherwise wash the bunny”.

Reading code is very much like reading a text message, or a telegraph (if you're old), or listening to noisy audio.

Now let's try filter. What's an explicit reading of (filter fluffy? bunnies) ? Is it as close syntactically in the space of sentences as “filter out (the) fluffy bunnies”.

Like the expected “then” and “otherwise“, if you read “filter” as a verb you often expect the word “out” or words “out of” somewhere.

My explanation is, I think, based on some standard cognitive science and linguistics. But I don't know those areas formally, so would love it if a cognitive scientist or linguist weighed in here.

bmitc commented 2 years ago

Out of curiosity, I made a table of what names different languages use for filtering functionality. I'm quite surprised that Racket is the only language I could find that actually has two separate filter functions in the standard library (filter and filter-not) for the two different ways a predicate could be handled. Suggested additions to this table are welcome.

Elixir has both Enum.filter/2 and Enum.reject/2, the latter being certainly more clear than filter-not.

Regarding the general topic here, given that the ambiguity of filter comes from the fact that the common use of filter means to keep some things and reject others, I actually think that replacing filter with both keep and reject makes things very clear. The idea that filter is purportedly embedded in culture enough to not rethink its name is shortsighted, in my opinion. The first thing someone is going to need to do in a new language is look up the list or enum module documentation to understand the function signature of filter anyway, or they may be greeted with a CLI help tool or IDE intellisense. If it isn't there and is instead replaced with keep and reject, for example, it is a one time event that will come and go rather unceremoniously. The user may even be pleasantly surprised that the language favors removing ambiguity.

sorawee commented 1 year ago

From the Racket Discord:

SamPh — Today at 8:01 AM Yeah I always think of filter as removing

badkins — Today at 8:01 AM Funny you should mention that. Using the word "filter" in a conversation with my client was a source of confusion! As in, are we "filtering out" or "filtering in" ? :)

SamPh — Today at 8:01 AM select / reject

badkins — Today at 8:03 AM I probably agree, if starting from scratch, but the meaning of "filter" in functional programming is now pretty ingrained in me. I feel more strongly about argument ordering. For example, I prefer (string-split sep str) because you're more likely to want to (curry string-split "\n") That's the type of change I would love to be able to make in "Racket 2" :) Maybe we can think of "filter" more as a fishing net than a coffee filter? :)

SamPh — Today at 8:12 AM The problem with "coffee filter" is that both what it prevents from passing through and what it passes through are both called coffee.

badkins — Today at 8:14 AM No doubt there is some ambiguity, but I suppose most people use a filter primarily for what makes it through i.e. air, water, coffee, oil, etc. :) Hard to argue with the clarity of the pair select / reject though. Anyway, let's focus first on proper argument ordering :)

mflatt commented 1 week ago

Rhombus has evolved to use "keep" and "skip" as iteration words, as in keep_when and skip_when clauses for for, as well as ~keep and ~skip arguments for filesystem.files. So, I think we're going to try

(lst :: List).filter(~keep: keep :: (Any -> Any.to_boolean) = fun (x): #true,
                     ~skip: skip :: (Any -> Any.to_boolean) = fun (x): #false)          

This way, each use of filter will specify explicitly which mode (or both) it means, as in the suggestion to name separate functions keep and reject — but we still get to keep the common name filter. If someone forgets to use (or has not yet learned to use) ~keep, the error message will make it clear.