nalimilan / FreqTables.jl

Frequency tables in Julia
Other
90 stars 19 forks source link

marginal values #8

Open diegozea opened 8 years ago

diegozea commented 8 years ago

Would be great to have the ability to show/calculate/store the marginal values of a table, when that is required.

Best,

nalimilan commented 8 years ago

Could you be a bit more specific? Does sum(tab, dims) or mean(tab, dims) do what you need?

diegozea commented 8 years ago

Yes, I'm doing that ;)

image

But I was wondering if freqtable(..., marginal=true) could return a table like this one:

x1 x2 X
y1 1 2 3
y2 3 2 5
Y 4 4 8
nalimilan commented 8 years ago

In R there's an addmargins function. Given that it's even shorter than adding marginal=true, that could be a better solution. Would you make a PR to add it?

diegozea commented 8 years ago

I'm sorry. I haven't time right now for working on that PR :/

nalimilan commented 8 years ago

Then I'll try to have a look later. Shouldn't be hard.

diegozea commented 8 years ago

Thanks! There is no hurry.

bkamins commented 6 years ago

I have a similar usecase but rather for an equivalent of prop.table in R. You can do it now using tab ./ sum(tab, dims), but it is such a common operation that maybe it should be handled by the package. I can imagine two options:

  1. additional wrapper function;
  2. keyword argument for margins.

How do you see it?

nalimilan commented 6 years ago

As I said, I'd rather go with a wrapper function. We could also imagine providing a function, say proptable, which would call freqtable and compute the proportions.

nico202 commented 4 years ago

Hi, has this issue been closed without fixing the issue? How to add marginals?

Thanks, Nicolò

bkamins commented 4 years ago

see the referenced PR https://github.com/nalimilan/FreqTables.jl/pull/19. You can use prop function.

nico202 commented 4 years ago

Thanks, I was using release 0.3.1 where it seems the keyword is not exported.

Btw, maybe I'm doing it wrong.

> table(dat$A, dat$B)

    1 2 3 4
  1 2 2 2 2
  2 2 2 2 2
  3 2 2 2 2

 > addmargins(table(dat$A, dat$B))

       1  2  3  4 Sum
  1    2  2  2  2   8
  2    2  2  2  2   8
  3    2  2  2  2   8
  Sum  6  6  6  6  24

addmargins(table(dat$A, dat$B), 1)

      1 2 3 4
  1   2 2 2 2
  2   2 2 2 2
  3   2 2 2 2
  Sum 6 6 6 6

julia> freqtable(dat, :A, :B)
3×4 Named Array{Int64,2}
A ╲ B │ 1  2  3  4
──────┼───────────
1     │ 2  2  2  2
2     │ 2  2  2  2
3     │ 2  2  2  2

 prop(freqtable(dat, :A, :B), margins = 1)
3×4 Named Array{Float64,2}
A ╲ B │    1     2     3     4
──────┼───────────────────────
1     │ 0.25  0.25  0.25  0.25
2     │ 0.25  0.25  0.25  0.25
3     │ 0.25  0.25  0.25  0.25

prop(freqtable(dat, :A, :B), margins = (1,2))
3×4 Named Array{Float64,2}
A ╲ B │   1    2    3    4
──────┼───────────────────
1     │ 1.0  1.0  1.0  1.0
2     │ 1.0  1.0  1.0  1.0
3     │ 1.0  1.0  1.0  1.0
bkamins commented 4 years ago

Have a look at help of prop, this is the way to use it:

julia> prop([1 2; 3 4], 1, 2)
2×2 Array{Float64,2}:
 1.0  1.0
 1.0  1.0

julia> prop([1 2; 3 4])
2×2 Array{Float64,2}:
 0.1  0.2
 0.3  0.4

julia> prop([1 2; 3 4], 1)
2×2 Array{Float64,2}:
 0.333333  0.666667
 0.428571  0.571429

julia> prop([1 2; 3 4], 2)
2×2 Array{Float64,2}:
 0.25  0.333333
 0.75  0.666667

julia> prop([1 2; 3 4], 1, 2)
2×2 Array{Float64,2}:
 1.0  1.0
 1.0  1.0
nico202 commented 4 years ago

Thanks, but none of those is similar to what R's addmargins does (what's asked here)

nico202 commented 4 years ago

I mean, return this:

x = freqtable(dat, :A, :B)

vcat(hcat(x, sum(x, dims = 2)), hcat(sum(x, dims = 1)..., sum(x)))

4×5 Named Array{Int64,2}
A ╲ hcat │  1   2   3   4   5
─────────┼───────────────────
1        │  2   2   2   2   8
2        │  2   2   2   2   8
3        │  2   2   2   2   8
4        │  6   6   6   6  24

preserving names and so on

nico202 commented 4 years ago

Ugly, but this:

function addmargins(tab)
    x, y = names(tab)
    x = string.(x)
    y = string.(y)
    push!(x, "Sum")
    push!(y, "Sum")
    res = vcat(hcat(tab, sum(tab, dims = 2)), hcat(sum(tab, dims = 1)..., sum(tab)))
    setnames!(res, x, 1)
    setnames!(res, y, 2)
    res.dimnames = tab.dimnames
    res
end
4×5 Named Array{Int64,2}
A ╲ B    │   1    2    3    4  Sum
─────────┼────────────────────────
1        │   2    2    2    2    8
2        │   2    2    2    2    8
3        │   2    2    2    2    8
Sum      │   6    6    6    6   24
bkamins commented 4 years ago

Ah - understood. I do not think it is supported.

Out of curiosity - in what situation would you need it (apart from the fact that R provides it)? I am asking because I never needed such functionality (and I use FreqTables.jl on daily basis) + it is in general unsafe, as if you change the contents of such table the margins get invalidated, so you loose consistency of your table.

nico202 commented 4 years ago

In a report or a journal paper it's a nice way to present some data. In this specific case: I have an experiment with outliers. I want to show how many outliers are present for each condition, the sample size, the number of valid/invalid trials... I care about the proportion of valid/unvalid trials, but the raw numbers are more important (25% out of 4 or out of 10000 makes a big difference here).

Those tables sumarize it well:

7-8y old

#+call: outlier-frequency-by-age[:exports results](age="7-8y")

#+RESULTS:
| condoutlier \ cond | auditory | haptic | visual | crossmodal | Sum |
|--------------------+----------+--------+--------+------------+-----|
| false              |       18 |     37 |     38 |         35 | 128 |
| true               |       20 |      1 |      0 |          3 |  24 |
| Sum                |       38 |     38 |     38 |         38 | 152 |

10-11y old

#+call: outlier-frequency-by-age[:exports results](age="10-11y")

#+RESULTS:
| condoutlier \ cond | auditory | haptic | visual | crossmodal | Sum |
|--------------------+----------+--------+--------+------------+-----|
| false              |       33 |     46 |     46 |         46 | 171 |
| true               |       13 |      0 |      0 |          0 |  13 |
| Sum                |       46 |     46 |     46 |         46 | 184 |

adults

#+call: outlier-frequency-by-age[:exports results](age="adults")

#+RESULTS:
| condoutlier \ cond | auditory | haptic | visual | crossmodal | Sum |
|--------------------+----------+--------+--------+------------+-----|
| false              |       15 |     16 |     16 |         16 |  63 |
| true               |        1 |      0 |      0 |          0 |   1 |
| Sum                |       16 |     16 |     16 |         16 |  64 |

(the syntax here is emac's org mode, julia's code that's called is: addmargins(freqtable(data, :condoutlier, :cond, subset = data.agegroup .== age)))

For the three age groups you see N of subjects, n of trials, n of outliers by conditions... Quick and simple (even if in R is still even simplier, because you can call it with freqtable(data, :A :B, :C) and you get many tables, in the example above I have to run the function 3 times).

nico202 commented 4 years ago

Also, maybe conversion to string can be replaced by something like Union{eltype(x),AbstractString}?

bkamins commented 4 years ago

I agree with this use-case, but I would rather create a custom display function for this (that could e.g. automatically also use MIME-type to output HTML, LaTeX etc.) so that you have a separate Model from View.

nico202 commented 4 years ago

It might make sense, but you don't always want to display it with the marginals. So I don't know which is the best way to organize this. Any idea?

nalimilan commented 4 years ago

I agree something like addmargins can be useful. It should also allow specifying specific margins to which totals must be added.

Something which is annoying in R is when you want to add margins to a table of proportions: addmargins(prop.table(table(...), 1)) gives correct row sums (equal to 1) but meaningless column sums (equal to sums of row proportions) and grand total (equal to 2). So maybe we should try to find a more convenient API? For example, instead of a function we could add a keyword argument to freqtable and prop. Or maybe introduce addmargins, but also a keyword argument to prop since that's where the problem arises (for raw counts addmargins is OK).