mahmoud / boltons

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
https://boltons.readthedocs.org
Other
6.52k stars 353 forks source link

Add lookahead to iterutils #95

Open tuukkamustonen opened 8 years ago

tuukkamustonen commented 8 years ago

Would it be possible to add lookahead generator to boltons.iterutils:

import six

def lookahead(iterable):
    it = iter(iterable)
    last = six.next(it)
    for val in it:
        yield last, False
        last = val
    yield last, True

I can make a PR with tests and docs if you're ok with this.

What happened with six as dependency btw.? I see it was added in #11 but is no longer in master?

mahmoud commented 8 years ago

Hey there Tuuka! That's an interesting idea. I haven't needed lookahead myself, but I'm open to it, especially if you can share some usages of it out in the wild (or your own code).

As for your other question, six was removed because boltons are all self-contained modules depending only on the standard library (not even other boltons). That said, I'm not even sure if six is needed here, because next exists in both 2 and 3, based on a quick check.

Thanks again!

tuukkamustonen commented 8 years ago

Well, it is quite a trivial utility as you can get the same with:

for index, obj in enumerate(iterable):
    last = index == len(iterable) - 1

But that doesn't work with generators:

for index, obj in enumerate(generator):
    last = index == len(generator) - 1   # TypeError

(And also, there's boilerplate.)

So I've been using:

for line, last in lookahead(iterable):
    if last:
        ...

Unfortunately, use cases are specific, I don't think there are generic examples to share.

There are already few "lookahead" libraries out there:

These seem to provide ability to peek the next/previous values, in a bit more advanced way.

I, personally, haven't needed that, but rather:

It provides .atend, .atstart and .peek properties. It doesn't allow to peek backwards, nor multiple values forward, but probably fills the "common need". Something like this is what I'd be happy to see in boltons.

(Btw, I haven't used any of these 3 libraries, just googled them up.)

And yeah, you're correct about next(): seems like it exists in py2.6+. Cool.

kurtbrose commented 7 years ago

I wonder if "look behind" would be a generic way of implementing the thread-safe counter based on itertools.count() idiom.

Here's an example:

import itertools

class Counter(object):
    __slots__ = '_count', '_cur'

    def __init__(self, start=0):
        self._count, self._cur = itertools.count(start), start - 1

    def add(self):
        self._cur = self._count.next()

    def val(self):
        return self._cur + 1

    def __repr__(self):
        return "<Counter {0}>".format(self.val())

If there was a "look-behind" API would it be able to handle this? Kind of "peek at the previous value without advancing the iterator"?

prestonPD commented 6 years ago

I came here to request the same thing, except with also a "first" flag, and an interface modeled after enumerate(). So my proposed usage would be:

for is_first, is_last, item in demarcate(iterator):
    ...

I've needed something similar on a regular basis. The latest use case is for producing a row of tightly packed, closely related charts in matplotlib. In those cases, adding y axis labels and legends on each individual plot eats up precious space. (Rant: It also encourages the bad practice of using non-uniform y scales and plotting styles, when you know dang well the viewer's eye is going to crave uniform y scales and plotting styles as it scans the row)

So the primary y axis labeling goes only on the leftmost plot (i.e. the left side of the whole row, not each plot), and the secondary y axis labeling and legend go only on the rightmost plot (i.e. the right side of the whole row, not each plot).

for dataset_ind, (is_first, is_last, dataset) in enumerate(demarcate(list_of_datasets)):
    plt.subplot(1, len(list_of_datasets), 1 + dataset_ind)
    plt.plot(dataset)
    if is_first:
        # add primary y axis ticks, tick labels and axis label
    if is_last:
        # add secondary y axis ticks, tick labels, and axis label
        # add legend

I will try to recall some other use cases and edit this post as they come back to me.

I know the is_first and is_last flags only save 1 line of code each, if used in conjunction with enumerate() and a collection of known length. But here are my reasons for why it's better to have an iterator wrapper (demarcate) produce them:

  1. even if it's only 2 very simple lines of code to calculate is_first and is_last, it's not only boring boilerplate but also a chance for a clumsy developer (i.e. me) to screw something up
  2. take away the known length condition (i.e. consider the general iterator case) and this saves 5+ lines (which Tuuka put in the first post) plus a nice chunk of cognitive load on the developer
  3. this functionality could generally be implemented on the C side with much better speed & probably zero new variables to store, and moving the functionality into a library is the first step in that direction
  4. it will give wannabe functional programmers (i.e. me) a cheap thrill and yet another way to inch their codebase towards a functional approach
jayvdb commented 4 years ago

Another common basic need for a lookahead iterator is wrapping a pagination iterator so that paginator is one page ahead of the currently yielded item. This is needed if the paginator needs the current item id in order to get the next page, as this will break if the yielded item is deleted.