nalimilan / FreqTables.jl

Frequency tables in Julia
Other
89 stars 19 forks source link

Strip type information for CategoricalValues #66

Closed andreasnoack closed 3 years ago

andreasnoack commented 3 years ago

The type information is very noisy and can easily trigger limited output

julia> display(freqtable(categorical(["A", "B", "A", "B"]), categorical(["ONE", "ONE", "TWO", "TWO"])))
2×2 Named Matrix{Int64}
                         Dim1 ╲ Dim2 │   …
─────────────────────────────────────┼──────
CategoricalValue{String, UInt32} "A" │   …
CategoricalValue{String, UInt32} "B" │   …
andreasnoack commented 3 years ago

Hm. I guess this might actually be a NamedArrays or CategoricalArrays issue

andreasnoack commented 3 years ago

Well. I guess this could potentially happen here since these dicts could simply strip the type information

julia> tmp.dicts
(OrderedCollections.OrderedDict{CategoricalValue{String, UInt32}, Int64}("A" => 1, "B" => 2), OrderedCollections.OrderedDict{CategoricalValue{String, UInt32}, Int64}("ONE" => 1, "TWO" => 2))

so I'll reopen to discuss the options.

nalimilan commented 3 years ago

Yeah this should probably be fixed in CategoricalArrays. The only thing we could change here is to unwrap CategoricalValue variables, i.e. in the example convert them to String. IIRC we used to do that, but it's a bit weird to special-case CategoricalValue.

See https://github.com/JuliaData/CategoricalArrays.jl/pull/371.