python / cpython

The Python programming language
https://www.python.org
Other
63.18k stars 30.26k forks source link

join method for list and tuple #77395

Closed d7f1141d-245b-4ed8-89b1-56b207b1d4f2 closed 2 years ago

d7f1141d-245b-4ed8-89b1-56b207b1d4f2 commented 6 years ago
BPO 33214
Nosy @tiran, @merwok, @serhiy-storchaka, @MojoVampire, @Savier

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-feature', 'library'] title = 'join method for list and tuple' updated_at = user = 'https://bugs.python.org/JavierDehesa' ``` bugs.python.org fields: ```python activity = actor = 'serhiy.storchaka' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Javier Dehesa' dependencies = [] files = [] hgrepos = [] issue_num = 33214 keywords = [] message_count = 9.0 messages = ['314881', '314882', '314883', '314885', '352387', '352530', '352531', '352532', '352534'] nosy_count = 6.0 nosy_names = ['christian.heimes', 'eric.araujo', 'serhiy.storchaka', 'josh.r', 'Javier Dehesa', 'iamsav'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue33214' versions = ['Python 3.8'] ```

d7f1141d-245b-4ed8-89b1-56b207b1d4f2 commented 6 years ago

It is pretty trivial to concatenate a sequence of strings:

''.join([str1, str2, ...])

Concatenating a sequence of lists is for some reason significantly more convoluted. Some current options include:

    sum([lst1, lst2, ...], [])
    [x for y [lst1, lst2, ...] for x in y]
    list(itertools.chain(lst1, lst2, ...))

The first one being the less recomendable but more intuitive and the third one being the faster but most cumbersome (see https://stackoverflow.com/questions/49631326/why-is-itertools-chain-faster-than-a-flattening-list-comprehension ). None of these looks like "the one obvious way to do it" to me. Furthermore, I feel a dedicated concatenation method could be more efficient than any of these approaches.

If we accept that ''.join(...) is an intuitive idiom, why not provide the syntax:

[].join([lst1, lst2, ...])

And while we are at it:

().join([tpl1, tpl2, ...])

Like with str, these methods should only accept sequences of objects of their own class (e.g. we could do [].join(list(s) for s in seqs) if seqs contains lists, tuples and generators). The use case for non-empty joiners would probably be less frequent than for strings, but it also solves a problem that has no clean solution with the current tools. Here is what I would probably do to join a sequence of lists with [None, 'STOP', None]:

lsts = [lst1, lst2, ...]
joiner = [None, 'STOP', None]
lsts_joined = list(itertools.chain.from_iterable(lst + joiner for lst in lsts))[:-len(joiner)]

Which is awful and inefficient (I am not saying this is the best or only possible way to solve it, it is just what I, self-considered experienced Python developer, might write).

tiran commented 6 years ago

join() is a bad choice, because new developers will confusing list.join with str.join.

We could turn list.extend(iterable) into list.extend(*iterable). Or you could just use extend with a chain iterator:

>>> l = []
>>> l.extend(itertools.chain([1], [2], [3]))
>>> l
[1, 2, 3]
d7f1141d-245b-4ed8-89b1-56b207b1d4f2 commented 6 years ago

Thanks Christian. I thought of join precisely because it performs conceptually the same function as with str, so the parallel between ''.join(), [].join() and ().join() looked more obvious. Also there is os.path.join and PurePath.joinpath, so the verb seemed well-established. As for shared method names, index and count are present both in sequences and str - although it is true that these do return the same kind of object in any cases.

I'm not saying your points aren't valid, though. Your proposed way with extend is I guess about the same as list(itertools.chain(...)), which could be considered to be enough. I just feel that is not particularly convenient, especially for newer developers, which will probably gravitate towards sum(...) more than itertools or a nested generator expression, but I may be wrong.

serhiy-storchaka commented 6 years ago

String concatenation: f'{a}{b}{c}' List concatenation: [a, *b, *c] Tuple concatenation: (a, *b, *c) Set union: {a, *b, \c} Dict merging: {*a, **b, *\c}

99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 5 years ago

Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where:

[*seq for seq in allsequences]

is allowed.

    list(itertools.chain.from_iterable(allsequences))

handles that just fine, but I could definitely see it being convenient to be able to do:

[].join(allsequences)

That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.:

# For not-really-csv-but-people-do-it-anyway
','.join(row_strings)

# Separate words with spaces
' '.join(words)

# Separate lines with newlines
'\n'.join(lines)

I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.:

    list.concat(allsequences)

which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own).

Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution.

45386f33-fadd-4f9f-b70b-e36d26921b08 commented 5 years ago

in javascript join() is made the other way around ['1','2','3'].join(', ') so, [].join() may confuse some peoples.

tiran commented 5 years ago

in javascript join() is made the other way around ['1','2','3'].join(', ') so, [].join() may confuse some peoples.

It would be too confusing to have two different approaches to join strings in Python. Besides ECMAScript 1 came out in 1997, 5 years after Python was first released. By that argument JavaScript that should.

serhiy-storchaka commented 5 years ago

How common is the case of variable number of things to concatenate/union/merge?

From my experience, in most ceases this looks like:

    result = []
    for ...:
        # many complex statements
        # may include continue and break
        result.extend(items) # may be intermixed with result.append(item)

So concatenating purely lists from some sequence is very special case. And there are several ways to perform it.

    result = []
    for items in seq:
        result.extend(items)
        # nothing wrong with this simple code, really

    result = [x for items in seq for x in items]
    # may be less effective for really long sublists,
    # but looks simple

    result = list(itertools.chain.from_iterable(items))
    # if you are itertools addictive ;-)
serhiy-storchaka commented 5 years ago

It is history, but in 1997 Python had the same order of arguments as ECMAScript: string.join(words [, sep]). str.join() was added only in 1999 (226ae6ca122f814dabdc40178c7b9656caf729c2).

serhiy-storchaka commented 2 years ago

I think this idea has no future.