joshday / OnlineStats.jl

⚡ Single-pass algorithms for statistics
https://joshday.github.io/OnlineStats.jl/latest/
MIT License
836 stars 64 forks source link

Code cleanup #6

Closed tbreloff closed 9 years ago

tbreloff commented 9 years ago

There's a bunch of places where there's unnecessary parametric types/signatures and wasteful work . For example... Base.push!{T <: ScalarStat}(df::DataFrame, obj::T) = append!(df, DataFrame(obj)) might be better as: Base.push!(df::DataFrame, obj::ScalarStat) = push!(df, value(obj))

Note that:

I'll make changes like these as I see them... If you have any questions on why I made a certain change, don't hesitate to talk it through here.

joshday commented 9 years ago

It looks like push!(df, array) only works if array is a single row. If this is more efficient than the other version, I'm all for it:

function Base.push!(df::DataFrame, obj::ScalarStat)
    st = state(obj)
    mat = [state_names(obj) st fill(nobs(obj), length(st))]
    for i in 1:length(st)
        push!(df, mat[i, :])
    end
end
tbreloff commented 9 years ago

i just pushed up an implementation that we should discuss. Look at the Mean/Var types, as well as common.jl, weighting.jl, and the mean/var tests.

Note: now that update!(o, y::Vector) just loops through and calls update!(o, f::Float64) for each v in vector, you can't test for equality in some places... need to test for approximate equality. If the "update one at a time" model doesn't have enough numerical stability, we can still implement batch versions using lines like: o.mean = smooth(o.mean, mean(y), weight(o))

joshday commented 9 years ago

Looks great. I think approximate equality will do for now. We can try adding some stability tests. A few random comments below.

if VERSION < v"0.4.0-dev"
    weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? float64(n2 / (n1 + n2)) : 1.0
else
    weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? Float64(n2 / (n1 + n2)) : 1.0
end

df_A (for calling Gadfly.plot(df_A, x=:nobs, y=:value, color=:variable))

# variable      |   value    |  nobs
# ---------------------------------
# μ             |   1.0      | 5 
# σ²            |   2.0      | 5
# μ             |   1.5      | 10
# σ²            |   2.1      | 10
# ...

vs. df_B

# μ    |   σ²   |  nobs
# ---------------------
# 1.0  |   2.0  |  5
# 1.5  |   2.1  |  10
# 1.6  |   2.2  |  15

I think version B is more expected behavior and version A is one step away (df_A = melt(df_B, :nobs)), so df_B is what we should go with.

tbreloff commented 9 years ago

In regards to the julia version. Yes I'm using version 0.4.0-dev+4311 at the moment, which is about 11 days old. I think the correct way to handle differences between 0.3 and 0.4 is to use the Compat package. Correct syntax is something like @compat weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? Float64(n2 / (n1 + n2)) : 1.0 which would automatically do the correct conversion for 0.3.7. I haven't actually had to use Compat yet, but we should probably try to get in good habits. I'll look into it more. The good news is that we can test each others code so it'll naturally work for anyone else that needs to use it, and it'll work for you if you want to switch to 0.4 at some point. :)

joshday commented 9 years ago

I was unaware of Compat. That looks much more slick! I'll try it out.

tbreloff commented 9 years ago

I just checked the github page for compat... it looks like correct syntax might be: weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? @compat Float64(n2 / (n1 + n2)) : 1.0

joshday commented 9 years ago

Cool, trying it now.

joshday commented 9 years ago

Julia didn't like that. This seems to work:

@compat weight(w::EqualWeighting, n1::Int, n2::Int)...
tbreloff commented 9 years ago

I wonder if @compat(Float64(x)) would work, using the parenthetical version of the macro. Either way if it works correctly at the beginning of the line I think that's easier to read.

On Apr 28, 2015, at 2:27 PM, Josh Day notifications@github.com wrote:

Julia didn't like that. This seems to work:

@compat weight(w::EqualWeighting, n1::Int, n2::Int)... — Reply to this email directly or view it on GitHub.

tbreloff commented 9 years ago

Hey Josh... I was just looking through the commit history to see what you've been doing and prepare for a merge... it looks like you picked up most/all of my recent changes, but gitk doesn't show a merge. Are you calling git merge origin/tom or something similar to bring in my changes, or are you adding them manually? Just want to know if I'm seeing the correct history.

I've been writing out (on actual paper!) the update! function for the FLS algo, and wanted to sync up code before I start implementing.

joshday commented 9 years ago

I've been doing git checkout tom -- path/to/file. What is the preferred way?

tbreloff commented 9 years ago

Ok that makes sense. I think that method is ok for occasional, one-off, small changes... but normally you want to do a git merge to actually pull in my changes in a systematic way without overwriting the whole file. (If you edited it and I never picked up your change, your change would be overwritten)

The flow that you could try:

git commit -am "pre-merge commit"
git fetch
git diff origin/tom
git merge origin/tom
<fix conflicts and commit>
git push

Every once in a while when the code is stable and just after you've done the merge (so that you have all of our changes), you can:

git checkout master
git merge josh
<shouldn't be conflicts?>
git push
git checkout josh

and this should set master to be a snapshot of your branch at that moment (of course assuming you've merged in all changes to that point)

Hopefully this is good enough for 99% of cases... let me know if you have any questions. It might seem complicated right now, but it shouldn't be too bad if we're mostly working on different things.

joshday commented 9 years ago

That is super helpful. Seriously, thanks. That will be my new workflow.

tbreloff commented 9 years ago

Sure thing. I just merged in your latest, and had a bunch of conflicts (many of which weren't really conflicts but just a by-product of overwriting whole files). I fixed them, mostly by taking your version. Do you want to do a merge (as described above)? I'll recreate the tom branch from that commit.

joshday commented 9 years ago

I'll do that as soon as I'm back at my computer (probably 2 hours)

Sent from my iPhone

On Apr 28, 2015, at 5:49 PM, Tom Breloff notifications@github.com wrote:

Sure thing. I just merged in your latest, and had a bunch of conflicts (many of which weren't really conflicts but just a by-product of overwriting whole files). I fixed them, mostly by taking your version. Do you want to do a merge (as described above)? I'll recreate the tom branch from that commit.

— Reply to this email directly or view it on GitHub.

joshday commented 9 years ago

I merged your branch into mine and then mine into master.