johnmyleswhite / StreamStats.jl

Compute statistics over data streams in pure Julia
Other
48 stars 7 forks source link

Try immutable types for StreamStat objects #10

Open johnmyleswhite opened 9 years ago

johnmyleswhite commented 9 years ago

We should check what the performance gains are from replacing update! with update and moving core statistics to the stack.

johnmyleswhite commented 9 years ago

Right now, this doesn't seem to provide the perf wins I'd like to justify making the API more complicated. Executing the following script doesn't seem to really offer much win at all:

using Distributions
using StreamStats

# Learn mean of a single stream
xs = rand(Normal(0, 1), 10_000_000)

function bench1(xs)
    stat = StreamStats.Mean()
    for x in xs
        update!(stat, x)
    end
    return mean(stat)
end

function bench2(xs)
    stat = StreamStats.ImmMean()
    for x in xs
        stat = StreamStats.update(stat, x)
    end
    return mean(stat)
end

@time bench1(xs)
@time bench1(xs)
@time bench1(xs)
@time bench1(xs)
@time bench1(xs)

@time bench2(xs)
@time bench2(xs)
@time bench2(xs)
@time bench2(xs)
@time bench2(xs)

# Learn marginal means of a stream of vectors
xs = rand(Normal(0, 1), (25_000, 10_000))

function bench3(xs)
    p, n = size(xs)
    stats = Array(StreamStats.Mean, p)
    for i in 1:p
        stats[i] = StreamStats.Mean()
    end
    for j in 1:n
        for i in 1:p
            update!(stats[i], xs[i, j])
        end
    end
    return Float64[mean(stat) for stat in stats]
end

function bench4(xs)
    p, n = size(xs)
    stats = Array(StreamStats.ImmMean, p)
    for i in 1:p
        stats[i] = StreamStats.ImmMean()
    end
    for j in 1:n
        for i in 1:p
            stats[i] = StreamStats.update(stats[i], xs[i, j])
        end
    end
    return Float64[mean(stat) for stat in stats]
end

@time bench3(xs);
@time bench3(xs);
@time bench3(xs);
@time bench3(xs);
@time bench3(xs);

@time bench4(xs);
@time bench4(xs);
@time bench4(xs);
@time bench4(xs);
@time bench4(xs);

This gives:

elapsed time: 0.117149384 seconds (120 bytes allocated)
elapsed time: 0.115311669 seconds (120 bytes allocated)
elapsed time: 0.107262803 seconds (120 bytes allocated)
elapsed time: 0.108196941 seconds (120 bytes allocated)

elapsed time: 0.114858273 seconds (96 bytes allocated)
elapsed time: 0.113750718 seconds (96 bytes allocated)
elapsed time: 0.121045247 seconds (96 bytes allocated)
elapsed time: 0.113052939 seconds (96 bytes allocated)

elapsed time: 1.207779805 seconds (1000176 bytes allocated)
elapsed time: 1.22641896 seconds (1000176 bytes allocated)
elapsed time: 1.247533453 seconds (1000176 bytes allocated)
elapsed time: 1.197890847 seconds (1000176 bytes allocated)

elapsed time: 1.188194948 seconds (600176 bytes allocated)
elapsed time: 1.189423597 seconds (600176 bytes allocated)
elapsed time: 1.198438345 seconds (600176 bytes allocated)
elapsed time: 1.208701838 seconds (600176 bytes allocated)

I'm troubled by the memory allocation in the last two benchmarks, so I won't give up on this just yet. But I'd say immutability isn't going to be worth it.