google / guava

Google core libraries for Java
Apache License 2.0
50.19k stars 10.91k forks source link

Sliding window function #1027

Closed gissuebot closed 9 years ago

gissuebot commented 10 years ago

Original issue created by wasserman.louis on 2012-06-11 at 05:54 PM


Interest was expressed in http://stackoverflow.com/questions/10966183/can-you-replicate-clojures-partition-or-scalas-sliding-functions-with-guav/10972448 for a "sliding window" function, for example taking the list [1, 2, 3, 4] and an argument 2 and returning [[1, 2], [2, 3], [3, 4]].

Sample use cases cited include moving averages, bioinformatics (dealing with rolling windows of a DNA sequence), and a few other examples.

The most obvious API seems to be a List<List<E>> slidingWindow(List<E>, int), returning a view.

I guess my biggest concern with this design is that it doesn't seem to match the use cases? These use cases feel like they'd be better served by maintaining their own Deque for the sliding window, which seems more appropriate for operations that need to focus on the front and back of the window rather than the window as a whole.

gissuebot commented 10 years ago

Original comment posted by greg.steffensen on 2012-06-11 at 06:48 PM


To me, the most obvious use for this is with Strings, for techniques like Rabin fingerprinting and content-aware chunking. Which is why it's too bad that CharSequence doesn't extend Iterable, because then you'd be able to have a single method

Iterable<List<E>> slidingWindow(Iterable<E>, int)

that would work with both Strings and Collections. Having said that, an Iterable API still strikes me as more flexible, although I get that this only really makes sense for ordered collections, and the most common way to represent them is List.

One other concern- if this is being performed on very large inputs, could returning a new List (or new anything each time) stress the garbage collector unnecessarily? If that's a concern, would a separate class with an Enumeration-like API be preferable:

public interface SlidingWindow<E> {

  /*     * Get element p within the current window.          * @return the p-th element of the current window     * @throws IndexOutOfBoundsException     */   public E get(int p);

  /*     * Advances the window, if possible.          * @return true if the window was successfully advanced     */   public boolean next(); }

I'd think that the generic types used in practice with this method would be primitives (that would be the case for rolling averages, DNA sequences and content-sensitive chunking, for example), so there may be a lot of opportunity to avoid object instantiation in the common case.

gissuebot commented 10 years ago

Original comment posted by wasserman.louis on 2012-06-11 at 07:13 PM


Yeah, the issue is if it's an Iterable, then you need to do the explicit copy rather than a view. =(

But...as it stands, the non-Guava workaround with ArrayDeque isn't actually all that bad, for these use cases.

gissuebot commented 10 years ago

Original comment posted by j...@nwsnet.de on 2012-06-15 at 09:58 AM


A "sliding window" is useful when generating static HTML pages for items (images, articles, forum threads etc.) with links referencing the previous/next item.

Here is an example in Python (taken from my image gallery generator) that iterates over image objects and connects their previous/next fields (which can then be used in an HTML template to provide browse links) so the result is basically a double-linked list:

from itertools import islice

def link_images(images):
    """Assign the predecessor and successor for every image."""
    for previous_image, current_image, next_image \
            in slide_window([None] + images + [None], 3):
        image.previous_image = previous_image
        image.next_image = next_image

def slide_window(iterable, n):
    """Return a sliding window of width ``n`` over the iterable."""
    it = iter(iterable)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

While Python has a deque implementation (in the stdlib's collections module), too, I didn't find it to be an appropriate choice for this.

I might be wrong, but AFAIK deques require to be modified/rotated to return the desired results (instead of just a volatile view). Modification is something I'd like to avoid in order to eliminate misuse as well as the (supposedly) involved overhead.

gissuebot commented 10 years ago

Original comment posted by kevinb@google.com on 2012-06-19 at 05:49 PM


(No comment entered for this change.)


Status: Acknowledged

gissuebot commented 10 years ago

Original comment posted by wasserman.louis on 2012-06-22 at 02:25 PM


To be clear, the Deque workaround is for basically the case when you know you're only doing a single pass through the sliding window, so you can afford to mutate the deque as you walk through.

gissuebot commented 10 years ago

Original comment posted by kevinb@google.com on 2012-06-22 at 06:16 PM


(No comment entered for this change.)


Status: Research

gissuebot commented 10 years ago

Original comment posted by otis.gospodnetic on 2012-12-05 at 07:55 PM


Is the idea behind this sliding window issue to provide support for apps/libraries like Esper that receive a continuous (and often very fast) stream of numeric values and run functions like sum() or avg() or on that stream using a sliding window that typically represents something like "last 5 minutes" or "last hour". This can then be used applications that create alerting applications.

So is the idea behind this issue to support that sort of sliding window functionality?

gissuebot commented 10 years ago

Original comment posted by lowasser@google.com on 2012-12-05 at 08:43 PM


For that application, it sounds like using a Deque is really what you need to do, and I don't think there's much we could do to make that use case easier.

kevinb9n commented 9 years ago

We have EvictingQueue now, making this even easier.