Open alyst opened 7 years ago
Indeed Array{CategoricalValue}
isn't supposed to be used. It's kind of annoying that we don't have a way to specify that a given element type should always be stored in a given container type.
I have some minimal solution to this, by it still involves moving+extending Enumerable
abstract type definitions from QueryOperators.jl
to TableTraits.jl
and using them from IterableTables.jl
. So it's pretty invasive.
I can open PRs, but I suppose the core JuliaData contributors already have some strategy how to fix this and the other design problems (e.g. IterableTables.jl being dependent on every data-related package).
The proper way to implement this would be to make sure that at collection into a DataFrame
any column of type CategoricalValue
uses the proper container type. Essentially just another elseif
case here. Having said that, I think anything like that would require a bit more thinking about larger implications. I was a couple of times close to just adding this, but in the end I'm just not yet sure whether I want this in the iterable tables interface or not (main tradeoff is complexity, I'm trying to keep iterable tables as simple as possible, and I'm not sure this would still fall into that category).
It's kind of annoying that we don't have a way to specify that a given element type should always be stored in a given container type.
YES, I completely agree. I think it would be absolutely fantastic if there was a standard way in base to say "for element type T, the default array type should be X", and if things like [a,b,c]
(where a
, b
and c
are instances of T
) constructed an X
, and and fill
etc. all used that. I think a simple function like getdefaultarraytype(::Type{T}) where T = Array{T}
in base would probably do it, where custom T
s could add a method and the various base array creation methods would rely on it.
I suggested something like that in https://github.com/JuliaLang/julia/issues/22630, but probably doomed all chances of success of that proposal by embedding it in a crazy rename idea ;) I think a much less ambitious proposal along the lines of the above paragraph would still be a great idea.
Maybe we should just override similar(::Array, ::Type{<:CategoricalValue})
?
Where would that be picked up?
I guess I was so far mostly worrying about [a,b,c]
when a
, b
and c
are of type CategoricalValue
(or in my case DataValue
) and how one can intercept that. I think we can already intercept CategoricalValue[a,b,c]
, and this line seems key to intercepting all cases.
I kind of feel that if someone is asking for an Array
specifically, he/she should get that specific type, right? Not sure...
I kind of feel that if someone is asking for an Array specifically, he/she should get that specific type, right? Not sure...
In this case, the caller wouldn't require an Array
, just an object "similar" to the input Array
, so I'd say it wouldn't be absurd to return a CategoricalArray
if the element type is a categorical value.
Julia v0.6, DataFrames, CategoricalArray and Query masters
IIUC
Array{CategoricalValue}
is suboptimal in terms of storage. Also CategoricalArrays API likelevels(dff[:a])
is not available.