python / devguide

The Python developer's guide
https://devguide.python.org/
Creative Commons Zero v1.0 Universal
1.83k stars 757 forks source link

List terms which should be avoided #605

Open vstinner opened 4 years ago

vstinner commented 4 years ago

I replaced master/slave with parent/child or server/client in Python. I'm now replacing whitelist/blacklist with allowlist/denylist in Python:

It may be nice to have a list of terms which should be avoided in the Python projects and propose better terms:

I don't need that we have to argue in details why specific terms should be avoided, but just give a general guideline like:

The intent is to make the Python community more welcoming and more diverse by avoiding to make some people uncomfortable.

cc @brettcannon @warsaw @willingc @Mariatta @ejodlowska

vstinner commented 4 years ago

Some articles about terminology:

I will not attempt to give an exhaustive list of articles, it's a hot topic and there are more and more articles about that!

hugovk commented 4 years ago

Newest draft is v3 (July 08, 2020):

Edit: now at v4 (August 24, 2020):

vstinner commented 4 years ago

"Inclusive Chromium code": https://chromium.googlesource.com/chromium/src/+/master/styleguide/inclusive_code.md

brettcannon commented 4 years ago

Are the guides moving to "allowlist"/"denylist"? I've seen "allow list"/"deny list" (notice the spaces). I've also seen "block list".

vstinner commented 4 years ago

I don't think that we have to require one replacement. We can suggest multiple alternatives. For example, first I replaced blacklist with denylist in http.cookiejar documentation, but when I read again the whole documentation, it seems like blocklist is more appropriate: https://github.com/python/cpython/pull/21826

PaulMcMillan commented 4 years ago

I agree that it is reasonable to enumerate terms that should be avoided. As always, the peanut gallery (itself an economically loaded term) is likely to chime in and tell us we're all wrong for trying to do the right thing, but it is important that we persist.

I've been seeing allowlist/blocklist as an accepted replacement terminology pretty broadly across my professional security-oriented network. I like it because I find it unambiguous, and "block" has slightly stronger connotations than "deny" in parallel usage. For better or worse, it's also verbally similar enough to allow for seamless substitution and general comprehension among folks who haven't thought carefully about the issue.

For "master/slave" replacement I have a personal preference for "leader/follower" in cases where it semantically fits. It's clear, makes fewer assumptions about western nuclear family structure than "parent/child", and avoids the abiguity associated with "server/client" where "client" is a potential leader but just happens to be following (in contrast to "client" in the "queries the data store" sense).

[and not to go down a rabbit hole, but "client" can also be problematic given the feudal connotations]

More generally, I think it is helpful to enumerate specific turns of phrase to be avoided, since we otherwise risk falling to an onslaught of malintentioned arguers with a goal of derailing improvements via reductio ad absurdum. [yes, etymology is problematic with this phrasing too]

I suggest this wording:

The Python community welcomes all. We avoid language associated with opposition to this goal.

aeros commented 4 years ago

@vstinner

master/slave => parent/child, server/client, main, primary/seconday, etc. whitelist/blacklist => allowlist/denylist, ignore, etc.

+1 For the proposed replacements. It might be a decent idea to put together a table in the devguide (perhaps in the "Documentating Python" section). My only feedback would be to use some caution with regards to banning the usage of terms that are not widely considered to be controversial, so that we don't end up spending an infinite amount of time on this.

I'm personally of the mindset that there are definitely outdated terms that we should move away from ("master/slave" and "blacklist/whitelist" being some of the more egregious offenders), but that it's impossible to avoid making someone out there uncomfortable; e.g. I'd have to disagree with @PaulMcMillan about avoiding usage of "parent/child" and "server/client". If you dig deep enough, there are going to be some potential connotations with any terminology. To me at least, those connotations have to be adequately clear and severe enough to justify the time spent replacing them (and of course, the replacement should be just as clear if not more clear than the original). Otherwise, the maintenance cost starts to outweigh the benefits, especially if it is determined that the replacements are offensive to someone a few years down the road.

brettcannon commented 4 years ago

At work we are avoiding "master", "slave", "blacklist", and "whitelist" and I have heard the same from other companies. I think that's good for now and if other teams come out as generally unacceptable we can handle them then.

vstinner commented 3 years ago

FYI there was a discussion about the usage of gendered language in the PEP 3136: https://bugs.python.org/issue41743

Mariatta commented 3 years ago

I agree overall on a list of inclusive languages, but I'm not sure if the devguide is the right place for it. I think perhaps this could fall within the scope for the upcoming Docs WG, to provide a guideline of acceptable terms. I would hope we use those terminologies not just for writing documentation, comments, code, but also during normal communications like in mailing lists/discussion forums.

vstinner commented 3 years ago

I agree overall on a list of inclusive languages, but I'm not sure if the devguide is the right place for it.

Do you have a better place for such list?

vstinner commented 3 years ago

The Docs WG doesn't exist yet, so devguide can a temporary home, until it is moved to somewhere else.

brettcannon commented 3 years ago

Are we in such a rush to have an allowlist of terms that we can't wait for the Docs WG?

JulienPalard commented 3 years ago

I don't think that we have to require one replacement. We can suggest multiple alternatives.

Agree, as a french speaker I'm glad when alternatives are proposed (building it myself in my brain take efforts, I miss some, and it take time). Multiple alternatives is good too, so we can pick the most relevant one according to the context (which is often more semantically right than the avoided terms).

The Docs WG doesn't exist yet, so devguide can a temporary home, until it is moved to somewhere else.

Agree, the https://devguide.python.org/documenting/ is a good place for those recommendations. (nobody denies the Docs WG to edit this page), it's not like we're creating a whole new thing.

8vasu commented 7 months ago

Dear maintainers, please consider this comment that I made seriously before making any final naming decisions.


Edit:

Some data: about naming pty-pairs; apart from the file/project/module/documentation-local confusion that inevitably will arise due to using process-specific terminology parent/child; networking-specific terminology server/client, and generic terms main, primary/seconday, here is some global data directly generated using script at the bottom of this post:

Current timezone and date: Sun Jan 21 06:17:13 AM CET 2024 Current directory: /tmp/tmp.21-Jan-2024+05:14:25.JZlA6tqeCz Temporarily moving into directory: /tmp/tmp.uZECXBwUzE

Cloning CPython repository...

Cloning into 'cpython'... remote: Enumerating objects: 1004157, done. remote: Counting objects: 100% (690/690), done. remote: Compressing objects: 100% (425/425), done. remote: Total 1004157 (delta 428), reused 448 (delta 265), pack-reused 1003467 Receiving objects: 100% (1004157/1004157), 549.31 MiB | 5.16 MiB/s, done. Resolving deltas: 100% (804152/804152), done.

Moving into CPython repository... /tmp/tmp.uZECXBwUzE/cpython

Running grep to count number of occurences of concerned terms:

master: 757 occurences ignoring cases and 727 occurences not ignoring cases master_fd: 80 occurences ignoring cases and 80 occurences not ignoring cases slave: 155 occurences ignoring cases and 154 occurences not ignoring cases slave_fd: 43 occurences ignoring cases and 43 occurences not ignoring cases parent: 3760 occurences ignoring cases and 3255 occurences not ignoring cases parent_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases child: 5703 occurences ignoring cases and 5067 occurences not ignoring cases child_fd: 3 occurences ignoring cases and 3 occurences not ignoring cases server: 6237 occurences ignoring cases and 5045 occurences not ignoring cases server_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases client: 3036 occurences ignoring cases and 2682 occurences not ignoring cases client_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases primary: 378 occurences ignoring cases and 346 occurences not ignoring cases primary_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases secondary: 38 occurences ignoring cases and 35 occurences not ignoring cases secondary_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases main: 25612 occurences ignoring cases and 24847 occurences not ignoring cases main_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases second: 4361 occurences ignoring cases and 3987 occurences not ignoring cases second_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases mother: 1 occurences ignoring cases and 1 occurences not ignoring cases mother_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases (^|[^j])son: 0 occurences ignoring cases and 0 occurences not ignoring cases (^|[^j])son_fd: 0 occurences ignoring cases and 0 occurences not ignoring cases

Removing CPython repository to free up space in /tmp...


#!/bin/sh -e

l="master master_fd slave slave_fd parent parent_fd \
child child_fd server server_fd client client_fd primary \
primary_fd secondary secondary_fd main main_fd second \
second_fd mother mother_fd (^|[^j])son (^|[^j])son_fd"
TMP_DIR_PARENT="${1:-/tmp}"

echo "*Current timezone and date:* $(date)"
echo "*Current directory:* $(pwd)"

# mktemp(1) is not POSIX
TMP_DIR="$(mktemp -d --tmpdir="$TMP_DIR_PARENT")"
echo "*Temporarily moving into directory:* $TMP_DIR"
cd "$TMP_DIR"
echo

echo "*Cloning CPython repository...*"
echo
git clone https://github.com/python/cpython.git
echo
echo "*Moving into CPython repository...*"
cd cpython
echo

echo "*Running grep to count number of occurences of concerned terms:*"
echo
for i in $l
do
# "grep -r" is not POSIX
n="$(grep -r -e $i 2>/dev/null | wc -l)"
m="$(grep -ri -e $i 2>/dev/null | wc -l)"

echo "\`$i\`: \`$m\` occurences ignoring cases and \`$n\` occurences not ignoring cases"
done

echo
echo "*Removing CPython repository to free up space in ${TMP_DIR_PARENT}...*"
cd ..
rm -rf ./cpython
vstinner commented 7 months ago

"master" and "slave" PTY/FD are names are not part of the Python pty module, so you're free to change these names in the documentation without introducing any backward incompatible change, no?

8vasu commented 7 months ago

I am not taking about backward incompatibility. I am just saying rename them to something unique; something that is neither too common nor used to refer to any other aspect of programming like processes, networking, etc.

For example, the forkpty() function involves both a pty pair and a pair of parent/child processes. If, for example, you rename master/slave to parent/child, not only will you introduce confusion while reading the source of the pty module and the pty-related functions in posixmodule.c, it will no longer be possible to quickly find all instances of pty-related code by performing a simple grep for master/slave.

vstinner commented 7 months ago

you introduce confusion while reading the source of the pty module and the pty-related functions in posixmodule.c

Why not changing terms in both modules?

8vasu commented 7 months ago

Of course you would change them exhaustively throughout the whole repository, but my request to you is to just be creative and come up with new, unique terms that to not collide/coincide/clash with existing terminology (with the only constraint being that they must start with 'm' and 's').

I suggested mother and son, but they do not have to be exactly that; you can do whatever you wish (you're the boss); just please invent NEW, unused terms for the pty pair nomenclature.

vstinner commented 7 months ago

Apparently, file descriptors returned by openpty() are not equal in permissions / behavior. For example, according to test_pty, apparently closing the first file descriptor can raise SIGHUP signal if the process is the session leader. If the first one is more important and/or "control" the second one, we can use the terms that I proposed in my first message:

master/slave => parent/child, server/client, main, primary/seconday, etc.

Server/client terms are common in networking. primary/seconday is more common on databases when a secondary replicates the primary.

Here, parent/child sounds like good terms: parent_fd and child_fd.

I suggested mother and son

I didn't see usage of these terms previously. Parent sounds more (gender) neutral than mother.

8vasu commented 7 months ago

Here, parent/child sounds like good terms: parent_fd and child_fd.

While they do sound more on-topic compared to server/client or primary/secondary, the inter-process communication terms master and slave do not imply a parent-child relationship between the processes using them.

  1. Examples where the terms will mislead a reader to assume a false hierarchical relationship between processes: screen(1) and tmux(1) can attach or detach independently started processes to the slave end of a pty.
  2. Examples where the parent-child terminology for the pty ends might be a good fit: forkpty(), script(1); see image below.

Also, the terms parent and child do not start with m, s. It will be difficult to detect existing code that simply uses these initials as variable names.

I didn't see usage of these terms previously. Parent sounds more (gender) neutral than mother.

You do not have to use my terms; will it be possible to get opinions of well-known system programmers? Or maybe start a poll if a democratic process is preferred?

If we use unique/new/non-conflicting terms in the source, it will be easier to change them in the future if projects other than Python reach a consensus on what terms to use and such terms become part of some standard.

script(1)

vstinner commented 7 months ago

I'm not sure of what you are suggesting. Which terms do you prefer?

8vasu commented 7 months ago

I asked GPT4 API the following:

We are looking for terms that can replace the terms 'master' and 'slave' used to refer to pseudoterminal ends. These are our constraints:

  1. The terms must be inclusive.
  2. The terms must retain the initials 'm' and 's'.
  3. The terms must still convey the message that the master end is the "control center" of the slave end.
  4. The terms must not coincide with existing technology terms or other common terms such as parent/child, server/client, first/second, main/subordinate, primary/secondary, etc.

Its outputs are at the bottom. I like modulator and satellite, which grep(1) finds only 3 and 0 times in the Python repository respectively.

@vstinner Note: you probably know this already, but just in case yo do not: in your root comment (and in subsequent comments that copied it), it says "seconday" instead of "secondaRy".


vstinner commented 7 months ago

@8vasu: I don't think that the devguide issue tracker is the right place to discuss very specific names of the Python pty module. Sorry, I shouldn't have replied here. I suggest to continue the discussion in the Python bug tracker.

8vasu commented 7 months ago

I understand.