xiaodaigh / JDF.jl

Julia DataFrames serialization format
MIT License
90 stars 9 forks source link

Missing values in categorical arrays turn into #undef #73

Closed timbp closed 2 years ago

timbp commented 2 years ago
julia> using DataFrames, CategoricalArrays

julia> using JDF

julia> df1 = DataFrame(sex=["Male", missing, "Female"])
3×1 DataFrame  
 Row │ sex     
     │ String? 
─────┼─────────
   1 │ Male    
   2 │ missing 
   3 │ Female  

julia> df2 = DataFrame(sex=categorical(["Male", missing, "Female"]))
3×1 DataFrame  
 Row │ sex     
     │ Cat…?   
─────┼─────────
   1 │ Male    
   2 │ missing 
   3 │ Female  

julia> JDF.save("df1.jdf", df1)
JDFFile{String}("df1.jdf")

julia> JDF.save("df2.jdf", df2)
JDFFile{String}("df2.jdf")

julia> JDF.load("df1.jdf")
JDF.Table((sex = Union{Missing, String}["Male", missing, "Female"],))

julia> JDF.load("df2.jdf")
JDF.Table((sex = CategoricalValue{String, UInt32}["Male", #undef, "Female"],))
xiaodaigh commented 2 years ago

Thanks for the bug report. I can replicate this. Let me see what's happening.

I think the issue is that the code currently deals with Vector{Union{T, Missing} but not CategoricalArray{Union{Missing, String}}.

Looking into the code to get a fix done.