Closed d7f1141d-245b-4ed8-89b1-56b207b1d4f2 closed 2 years ago
It is pretty trivial to concatenate a sequence of strings:
''.join([str1, str2, ...])
Concatenating a sequence of lists is for some reason significantly more convoluted. Some current options include:
sum([lst1, lst2, ...], [])
[x for y [lst1, lst2, ...] for x in y]
list(itertools.chain(lst1, lst2, ...))
The first one being the less recomendable but more intuitive and the third one being the faster but most cumbersome (see https://stackoverflow.com/questions/49631326/why-is-itertools-chain-faster-than-a-flattening-list-comprehension ). None of these looks like "the one obvious way to do it" to me. Furthermore, I feel a dedicated concatenation method could be more efficient than any of these approaches.
If we accept that ''.join(...) is an intuitive idiom, why not provide the syntax:
[].join([lst1, lst2, ...])
And while we are at it:
().join([tpl1, tpl2, ...])
Like with str, these methods should only accept sequences of objects of their own class (e.g. we could do [].join(list(s) for s in seqs) if seqs contains lists, tuples and generators). The use case for non-empty joiners would probably be less frequent than for strings, but it also solves a problem that has no clean solution with the current tools. Here is what I would probably do to join a sequence of lists with [None, 'STOP', None]:
lsts = [lst1, lst2, ...]
joiner = [None, 'STOP', None]
lsts_joined = list(itertools.chain.from_iterable(lst + joiner for lst in lsts))[:-len(joiner)]
Which is awful and inefficient (I am not saying this is the best or only possible way to solve it, it is just what I, self-considered experienced Python developer, might write).
join() is a bad choice, because new developers will confusing list.join with str.join.
We could turn list.extend(iterable) into list.extend(*iterable). Or you could just use extend with a chain iterator:
>>> l = []
>>> l.extend(itertools.chain([1], [2], [3]))
>>> l
[1, 2, 3]
Thanks Christian. I thought of join precisely because it performs conceptually the same function as with str, so the parallel between ''.join(), [].join() and ().join() looked more obvious. Also there is os.path.join and PurePath.joinpath, so the verb seemed well-established. As for shared method names, index and count are present both in sequences and str - although it is true that these do return the same kind of object in any cases.
I'm not saying your points aren't valid, though. Your proposed way with extend is I guess about the same as list(itertools.chain(...)), which could be considered to be enough. I just feel that is not particularly convenient, especially for newer developers, which will probably gravitate towards sum(...) more than itertools or a nested generator expression, but I may be wrong.
String concatenation: f'{a}{b}{c}' List concatenation: [a, *b, *c] Tuple concatenation: (a, *b, *c) Set union: {a, *b, \c} Dict merging: {*a, **b, *\c}
Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where:
[*seq for seq in allsequences]
is allowed.
list(itertools.chain.from_iterable(allsequences))
handles that just fine, but I could definitely see it being convenient to be able to do:
[].join(allsequences)
That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.:
# For not-really-csv-but-people-do-it-anyway
','.join(row_strings)
# Separate words with spaces
' '.join(words)
# Separate lines with newlines
'\n'.join(lines)
I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.:
list.concat(allsequences)
which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own).
Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution.
in javascript join() is made the other way around ['1','2','3'].join(', ') so, [].join() may confuse some peoples.
in javascript join() is made the other way around ['1','2','3'].join(', ') so, [].join() may confuse some peoples.
It would be too confusing to have two different approaches to join strings in Python. Besides ECMAScript 1 came out in 1997, 5 years after Python was first released. By that argument JavaScript that should.
How common is the case of variable number of things to concatenate/union/merge?
From my experience, in most ceases this looks like:
result = []
for ...:
# many complex statements
# may include continue and break
result.extend(items) # may be intermixed with result.append(item)
So concatenating purely lists from some sequence is very special case. And there are several ways to perform it.
result = []
for items in seq:
result.extend(items)
# nothing wrong with this simple code, really
result = [x for items in seq for x in items]
# may be less effective for really long sublists,
# but looks simple
result = list(itertools.chain.from_iterable(items))
# if you are itertools addictive ;-)
It is history, but in 1997 Python had the same order of arguments as ECMAScript: string.join(words [, sep]). str.join() was added only in 1999 (226ae6ca122f814dabdc40178c7b9656caf729c2).
I think this idea has no future.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', 'type-feature', 'library']
title = 'join method for list and tuple'
updated_at =
user = 'https://bugs.python.org/JavierDehesa'
```
bugs.python.org fields:
```python
activity =
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'Javier Dehesa'
dependencies = []
files = []
hgrepos = []
issue_num = 33214
keywords = []
message_count = 9.0
messages = ['314881', '314882', '314883', '314885', '352387', '352530', '352531', '352532', '352534']
nosy_count = 6.0
nosy_names = ['christian.heimes', 'eric.araujo', 'serhiy.storchaka', 'josh.r', 'Javier Dehesa', 'iamsav']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue33214'
versions = ['Python 3.8']
```