nalimilan / FreqTables.jl

Frequency tables in Julia
Other
90 stars 19 forks source link

Entry with zero values for CategoricalArrays #72

Open mwsohn opened 3 months ago

mwsohn commented 3 months ago

The following code explains the problem:

using DataFrames, CategoricalArrays, FreqTables df = DataFrame(id = 1:9, race = repeat(collect(1:3),3)) df.race = categorical(df.race)

freqtable(df,:race)

3-element Named Vector{Int64} race │ ──────┼── 1 │ 3 2 │ 3 3 │ 0

df2 = filter( x -> x.race != 3, df) freqtable(df2, :race, skipmissing = true)

3-element Named Vector{Int64} race │ ──────┼── 1 │ 3 2 │ 3 3 │ 0

As you can see, df2 does not have any rows with race == 3. But the frequency table reports the entry with zero value. skipmissing option does not affect the output at all.

bkamins commented 3 months ago

I think the current behavior is useful, but an option to list only actual levels (not potential levels) would make sense. Maybe droplevels kwarg?

For now you can do:

julia> freqtable(droplevels!(copy(df2.race)))
2-element Named Vector{Int64}
Dim1  │
──────┼──
1     │ 3
2     │ 3