Run-length encoding example

marcus-voss commented 4 years ago

Thanks for the implementation. I'd be interested to use it in a project on sparse binary sequences. However, I'm not yet sure about your expected format. You provide these two arrays here as examples:

s = np.array([1, 2, 3, -1, 1])
t = np.array([1, -2, 4, 1])

However, I don't get how these match either the run-length encoding of the paper (then I'd expect numbers highlighted somehow to give an idea of the number of zeros) nor the encoding described as in Wikipedia, as then I'd expect both to be of even length, as it would be alternating between the count and the value (i.e., decoded t would be t = np.array([-2, 1,1,1,1]), but s would not be clear due to uneven length).

Can you provide an example of the original time series and the encoded ones?

MichaelSchreier commented 4 years ago

If I'm not mistaken -x represents a run of x consecutive zeros. Hence the two time series in the example decode to:

s = [1, 2, 3, 0, 1]
t = [1, 0, 0, 4, 1]

tmsss commented 3 years ago

I'm facing the same issue, I can't find a run-length enconding algorithm to match the one in the paper. Regarding @MichaelSchreier's answer, I think that we can't decode it like that, in the example in the paper [1 0 0 4 1] encodes to [1 (2) 4 1], therefore the negative numbers don't seem to encode the consecutive zeros.

tmsss commented 3 years ago

I've came up with this solution, I haven't tested it a lot but it seems to be working:

x = [0, 0, 0, 2, 3, 0, 5, 6, 0, 0, 4, 0, 0]
def rle(series):
    # convert to np array
    series = np.array(series)

    # add points to detect inflection on start and end
    series_ = np.concatenate(([1], series, [1]))

    # find zeros and non zeros
    zeros = np.where(series_ == 0)[0] 
    nonzeros = np.where(series_ != 0)[0]

    # detect sequencies
    zeros = zeros[np.where(np.diff(zeros) > 1)[0]]
    nonzeros = nonzeros[np.where(np.diff(nonzeros) > 1)[0]]

    split = np.sort(np.concatenate([zeros, nonzeros]))

    # avoid splitting on first element
    split = split[split > 0]

    series = np.split(series, split)

    # initialize empty array
    rle = []
    rle = np.array(rle, dtype=int)

    # encode zeros
    for s in series:
        if np.sum(s) == 0:
            rle = np.append(rle, [len(s)], axis=0)
        else:
            rle = np.concatenate([rle, s])    

    return rle

rle(x)
[3 2 3 1 5 6 2 4 2]

tmsss commented 3 years ago

After further testing, I've found some use cases where the encoding wasn't correct, so here is an updated version:

def rle(series):
    """ 
    Run length encoding for sparse time series to encode zeros as in needed for awarp calculation 
    (https://ieeexplore.ieee.org/document/7837859 | https://github.com/mclmza/AWarp)

    args
    ----
    series: sparse times series (e.g. x = [0, 0, 0, 2, 3, 0, 5, 6, 0, 0, 4, 0, 0])

    returns
    ---- 
    array with encoded zeros (e.g. [3 2 3 1 5 6 2 4 2]) """

    # convert to np array
    series = np.array(series)

    # add points to detect inflection on start and end
    series_ = np.concatenate(([1], series, [1]))

    # find zeros and non zeros
    zeros = np.where(series_ == 0)[0]

    if len(zeros) > 0:
        nonzeros = np.where(series_ != 0)[0]

        # detect zero sequencies
        split_zeros = np.where(np.diff(zeros) > 1)[0] + 1

        splitted_zeros = np.split(zeros, split_zeros)

        zero_points = []
        zero_points = np.array(zero_points, dtype=int)

        for z in splitted_zeros:
            zero_points = np.append(zero_points, z[-1])

        # detect non-zero sequencies
        nonzero_points = nonzeros[np.where(np.diff(nonzeros) > 1)[0]]

        # concat all splitting points
        split = np.sort(np.concatenate([zero_points, nonzero_points]))

        # avoid splitting on first element
        split = split[split > 0]

        # separate zero sequencies from non-zero sequencies
        splitted_series = np.split(series, split)

        # initialize empty array
        rle = []
        rle = np.array(rle, dtype=int)

        # encode zeros
        for s in splitted_series:
            # if it is a zero sequence enconde the lenght of the sequence
            if np.sum(s) == 0:
                rle = np.append(rle, [len(s)], axis=0)
            else:
                rle = np.concatenate([rle, s])

        # remove zeros in the end
        rle = rle[rle > 0]

        return rle
    else:
        return series

KilianB commented 2 years ago

According to the original source code in c++ where the exact same examples are given negative values indicate zero sequences

https://www.cs.unm.edu/~mueen/Projects/AWarp/

function [d, D] = AWARP(x,y)
%x and y are run-length encoded series where negative integers represent
%length of runs of zeros`

mclmza / AWarp

Run-length encoding example #1