Closed tbreloff closed 9 years ago
ScalarStat
subtypes for "possibly multiple parameters that are all scalars". For instance, Moments
tracks mean, variance, skewness, and kurtosis.It looks like push!(df, array)
only works if array
is a single row. If this is more efficient than the other version, I'm all for it:
function Base.push!(df::DataFrame, obj::ScalarStat)
st = state(obj)
mat = [state_names(obj) st fill(nobs(obj), length(st))]
for i in 1:length(st)
push!(df, mat[i, :])
end
end
i just pushed up an implementation that we should discuss. Look at the Mean/Var types, as well as common.jl, weighting.jl, and the mean/var tests.
Note: now that update!(o, y::Vector)
just loops through and calls update!(o, f::Float64)
for each v in vector, you can't test for equality in some places... need to test for approximate equality. If the "update one at a time" model doesn't have enough numerical stability, we can still implement batch versions using lines like:
o.mean = smooth(o.mean, mean(y), weight(o))
Looks great. I think approximate equality will do for now. We can try adding some stability tests. A few random comments below.
float64
instead of Float64
. You must be using 0.4.0?if VERSION < v"0.4.0-dev"
weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? float64(n2 / (n1 + n2)) : 1.0
else
weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? Float64(n2 / (n1 + n2)) : 1.0
end
DataFrame(o)
was to make something easy to plot, but I think your version is more expected.df_A (for calling Gadfly.plot(df_A, x=:nobs, y=:value, color=:variable)
)
# variable | value | nobs
# ---------------------------------
# μ | 1.0 | 5
# σ² | 2.0 | 5
# μ | 1.5 | 10
# σ² | 2.1 | 10
# ...
vs. df_B
# μ | σ² | nobs
# ---------------------
# 1.0 | 2.0 | 5
# 1.5 | 2.1 | 10
# 1.6 | 2.2 | 15
I think version B is more expected behavior and version A is one step away (df_A = melt(df_B, :nobs)
), so df_B
is what we should go with.
In regards to the julia version. Yes I'm using version 0.4.0-dev+4311 at the moment, which is about 11 days old. I think the correct way to handle differences between 0.3 and 0.4 is to use the Compat package. Correct syntax is something like @compat weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? Float64(n2 / (n1 + n2)) : 1.0
which would automatically do the correct conversion for 0.3.7. I haven't actually had to use Compat yet, but we should probably try to get in good habits. I'll look into it more. The good news is that we can test each others code so it'll naturally work for anyone else that needs to use it, and it'll work for you if you want to switch to 0.4 at some point. :)
I was unaware of Compat. That looks much more slick! I'll try it out.
I just checked the github page for compat... it looks like correct syntax might be:
weight(w::EqualWeighting, n1::Int, n2::Int) = n1 > 0 || n2 > 0 ? @compat Float64(n2 / (n1 + n2)) : 1.0
Cool, trying it now.
Julia didn't like that. This seems to work:
@compat weight(w::EqualWeighting, n1::Int, n2::Int)...
I wonder if @compat(Float64(x)) would work, using the parenthetical version of the macro. Either way if it works correctly at the beginning of the line I think that's easier to read.
On Apr 28, 2015, at 2:27 PM, Josh Day notifications@github.com wrote:
Julia didn't like that. This seems to work:
@compat weight(w::EqualWeighting, n1::Int, n2::Int)... — Reply to this email directly or view it on GitHub.
Hey Josh... I was just looking through the commit history to see what you've been doing and prepare for a merge... it looks like you picked up most/all of my recent changes, but gitk doesn't show a merge. Are you calling git merge origin/tom
or something similar to bring in my changes, or are you adding them manually? Just want to know if I'm seeing the correct history.
I've been writing out (on actual paper!) the update!
function for the FLS algo, and wanted to sync up code before I start implementing.
I've been doing git checkout tom -- path/to/file
. What is the preferred way?
Ok that makes sense. I think that method is ok for occasional, one-off, small changes... but normally you want to do a git merge
to actually pull in my changes in a systematic way without overwriting the whole file. (If you edited it and I never picked up your change, your change would be overwritten)
The flow that you could try:
git commit -am "pre-merge commit"
git fetch
git diff origin/tom
git merge origin/tom
<fix conflicts and commit>
git push
git commit
git fetch
to pull the branches from github into your local repo. You'll (likely) have remote branches origin/master, origin/josh, origin/tom, etcgit diff origin/tom
will give you a report of how your currently checked out local branch compares to my branch that I've pushed up to github<============
and similar which will wrap the differences from both branches. You have to go through the files and delete/change whatever you need to, then save the file. (this is called fixing merge conflicts) After you're done fixing all the conflicts, do another git commit -am "fixed merge"
git push
to send your branch to githubEvery once in a while when the code is stable and just after you've done the merge (so that you have all of our changes), you can:
git checkout master
git merge josh
<shouldn't be conflicts?>
git push
git checkout josh
and this should set master to be a snapshot of your branch at that moment (of course assuming you've merged in all changes to that point)
Hopefully this is good enough for 99% of cases... let me know if you have any questions. It might seem complicated right now, but it shouldn't be too bad if we're mostly working on different things.
That is super helpful. Seriously, thanks. That will be my new workflow.
Sure thing. I just merged in your latest, and had a bunch of conflicts (many of which weren't really conflicts but just a by-product of overwriting whole files). I fixed them, mostly by taking your version. Do you want to do a merge (as described above)? I'll recreate the tom branch from that commit.
I'll do that as soon as I'm back at my computer (probably 2 hours)
Sent from my iPhone
On Apr 28, 2015, at 5:49 PM, Tom Breloff notifications@github.com wrote:
Sure thing. I just merged in your latest, and had a bunch of conflicts (many of which weren't really conflicts but just a by-product of overwriting whole files). I fixed them, mostly by taking your version. Do you want to do a merge (as described above)? I'll recreate the tom branch from that commit.
— Reply to this email directly or view it on GitHub.
I merged your branch into mine and then mine into master.
There's a bunch of places where there's unnecessary parametric types/signatures and wasteful work . For example...
Base.push!{T <: ScalarStat}(df::DataFrame, obj::T) = append!(df, DataFrame(obj))
might be better as:Base.push!(df::DataFrame, obj::ScalarStat) = push!(df, value(obj))
Note that:
obj::ScalarStat
.. it's still compiled to the same method (i.e.Base.push!(df::DataFrame, obj::Var) = ...
) You only want the parametric if you need T in the RHS of the assignment or if you need types to match in the signature (i.e.add{T<:Integer}(x::T, y::T) = ...
will make sure x and y are the same type of integer) or if you want an abstract as the parameter of another type (i.e.Base.push!{T<:Integer}(df::DataFrame, obj::Vector{T}) = ...
is appropriate)push!()
any iterable of row values to a DataFrame and it will just push!()` each of those values to the columns.I'll make changes like these as I see them... If you have any questions on why I made a certain change, don't hesitate to talk it through here.