johnmyleswhite / StreamStats.jl

Compute statistics over data streams in pure Julia
Other
48 stars 7 forks source link

Make StreamStat objects immutable #18

Open johnmyleswhite opened 9 years ago

johnmyleswhite commented 9 years ago

In an attempt to get some performance wins, I tried making Mean() immutable by creating an ImmMean() type.

Would close #10.

Ken-B commented 9 years ago

How about something like this?

immutable immean
    streamsum::Float64
    len::Int
end
immean() = immean(0.,0)
update(sm::immean, term) = immean(sm.streamsum + term, sm.len+1)
Base.mean(sm::immean) = sm.streamsum / sm.len

This gives me a 10x speedup for the single stream benchmark in #10 and 3x speedup for the vector stream (exact same memory allocation as your immutable).

johnmyleswhite commented 9 years ago

Interesting. I haven't had much time to think about this package recently, but my recollection is that you can make things like ImmMean much faster, but that immutables that wrap a pointer-indirected object get slower.

johnmyleswhite commented 9 years ago

To be clear, my concern with that discrepancy is that it would make the API a lot more complicated.

Ken-B commented 9 years ago

I'm afraid I don't understand what you mean with pointer-indirected object.

The difference is that internally you don't keep the mean value but the sum and the length. The API does not change.

I don't know anything about overflow of the running sum, but my intuition says that a float can get pretty big :)

johnmyleswhite commented 9 years ago

Sorry: the broader context is how to handle statistics that require a field that isn't itself immutable. These immutable wrappers around mutable objects are problematic, which means that some statistics could be immutable but others would have to be mutable.

If you're interested in details, it's useful to explore how the isbits predicate works, since it makes the distinction that I'm concerned about.