pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.68k stars 261 forks source link

Extract value from iterable containing exactly one value #444

Open jtrakk opened 5 years ago

jtrakk commented 5 years ago

Sometimes I expect an iterable to only have one value, and I want to pull it out, failing if there 0 values or more than one value.

One way to do that is

lst = [99]
[val] = lst
assert val == 99

but this is rather implicit and hard to read if you've never seen it before.

Better would be

val = toolz.only(lst)

or

val = toolz.single(lst)
groutr commented 5 years ago

Would val = first(lst) work? This would raise an exception if there are no elements. In your application, you can assert len(lst) == 1

jtrakk commented 5 years ago

The implementation I'm thinking of is

def only(iterable: Iterable[T]) -> T:
    """Extract the only value in an iterable containing a single item."""
    [value] = iterable
    return value
groutr commented 5 years ago

Looking at the docs for itertoolz, it looks like there is a very careful effort to make all the functions work on iterables of any length. This seems to be a very specialized function and would seem to me inconsistent with the rest of the API. What would be the use case?

jtrakk commented 5 years ago

all the functions work on iterables of any length

That's a good point.

I find the [value] = iterable trick comes in handy very often though. For example, I expect there is only one item with id of 99.

Usually people handle this by

value = next(x for x in items if x.id == 99)

but this of course masks a bug if there are accidentally two or more matching items.

So the solution is

[value] = (x for x in items if x.id == 99)

but if you've never seen that before, it's not immediately obvious what's going on. Giving it a name and docstring will help:

value = toolz.only(x for x in items if x.id == 99)
groutr commented 5 years ago

I guess I've never had a use case like this come up. Perhaps @eriknw could weigh in?

eriknw commented 5 years ago

Thanks for the thoughtful suggestion and discussion. I've been giving this some thought. I've used this pattern a couple times the last couple of years. I usually do the following

value, = items  # unpack single item

But it's probably clearer to use @jtrakk's suggestion:

[value] = items  # unpack single item

Note that I always include a comment with this operation for clarity. To compare,

value = only(items)

isn't that bad at all. I'm warming up to this. I prefer the name only.

eriknw commented 5 years ago

I've had this come up a couple times again, and somebody said [value] = items is weird during a code review and needs a comment.

I like the proposed functionality, and I like both suggested names only and single. Does anybody have objections to adding this or a preference for the name? I may slightly prefer only simply because it's shorter.

groutr commented 5 years ago

So, I've actually come across a use case for this. np.ndenumerate on a 1d array. It could be argued that I could use enumerate, however, I like to stick to numpy land when dealing with numpy arrays. The indexes are returned as single value tuples, which for my use, I had to unpack before I could use them. Having something like this would have been really nice to do something like:

idx = ((only(ix), val) for ix, val in np.ndenumerate(arr))

I think we need to clarify proper behavior for failure cases. What should happen in failure cases? If I pass a sequence of more than one item, should an error be thrown? Passing an iterable of three elements, the suggested implementation will eat two values before throwing an exception. Do we want to avoid that?

jtrakk commented 5 years ago

If I pass a sequence of more than one item, should an error be thrown?

Yes. Otherwise you can just use next(iter(lst)).

Passing an iterable of three elements, the suggested implementation will eat two values before throwing an exception. Do we want to avoid that?

I don't think it's avoidable. You can't know that there are more items in an iterator until you try to pull the next one out.

In [1]: lst = [1,2,3]                                                                                                                                                                                                                                                                                  

In [2]: it = iter(lst)                                                                                                                                                                                                                                                                                 

In [3]: [x] = it                                                                                                                                                                                                                                                                                       
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-ba741f2ffd3e> in <module>
----> 1 [x] = it

ValueError: too many values to unpack (expected 1)

In [4]: list(it)                                                                                                                                                                                                                                                                                       
Out[4]: [3]
groutr commented 5 years ago

What would a more general unpacking function look like? Essentially, a special case of take that verifies length and returns values suitable for python's unpacking mechanisms.

# Unpack a 1 element sequence
# value == 1 (this is a special case)
value = unpack(1, [1])   

# Unpack 2 elements from sequence.   
# a == 1, b == 2
a, b = unpack(2, [1, 2]) 

# Throw ValueError if there aren't enough values
a, b = unpack(2, [1])

# Also throws ValueError. Too many values.  Maybe just drop the extra values?
a, b = unpack(2, [1, 2, 3, 4])

Thoughts?

jtrakk commented 5 years ago

That's very similar to itertools.islice().

groutr commented 5 years ago

@jtrakk you're correct. take is basically an alias for itertools.islice in pytoolz. The main addition in unpack would be the length checking. A possible implementation could be:

def unpack(n, seq):
    seq = iter(seq)
    rv = tuple(itertools.islice(seq, n))

    # check that we have enough elements
    if len(rv) != n:
        raise ValueError
    # check we don't have more values in seq
    for el in seq:
        raise ValueError

    if n == 1:
        return rv[0]
    return rv
eriknw commented 4 years ago

I'm not convinced a generalized version, unpack, is necessary, because

breakfast, lunch = i_haz_two_cheezburgers

is already concise and readable (and using (x, y) = ... or [x, y] = ... is also fine).

I'm +1 for only, and am open to other names.