xiaodaigh / JDF.jl

Julia DataFrames serialization format
MIT License
88 stars 9 forks source link

Odd issue ambiguous methoderror #48

Closed ym-han closed 4 years ago

ym-han commented 4 years ago

I got the following error while trying to save some files; I'll upload a MWE later.

ERROR: TaskFailedException:
MethodError: compress_then_write(::Array{Any,1}, ::BufferedStreams.BufferedOutputStream{IOStream}) is ambiguous. Candidates:
  compress_then_write(b::Array{Union{Missing, T},1}, io) where T in JDF at /users/yh31/.julia/packages/JDF/jDvZp/src/type-writer-loader/Missing.jl:7
  compress_then_write(b::Array{Union{Nothing, T},1}, io) where T in JDF at /users/yh31/.julia/packages/JDF/jDvZp/src/type-writer-loader/Nothing.jl:10
To resolve the ambiguity, try making one of the methods more specific, or adding a new method more specific than any of the existing applicable methods.
Stacktrace:
 [1] macro expansion at /users/yh31/.julia/packages/JDF/jDvZp/src/savejdf.jl:69 [inlined]
 [2] (::JDF.var"#49#52"{String,DataFrame,String,Int64})() at ./threadingconstructs.jl:169
Stacktrace:
 [1] wait at ./task.jl:267 [inlined]
 [2] fetch(::Task) at ./task.jl:282
 [3] _broadcast_getindex_evalf at ./broadcast.jl:648 [inlined]
 [4] _broadcast_getindex at ./broadcast.jl:621 [inlined]
 [5] getindex at ./broadcast.jl:575 [inlined]
 [6] copyto_nonleaf!(::Array{NamedTuple,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(fetch),Tuple{Base.Broadcast.Extruded{Array{Any,1},Tuple{Bool},Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) at ./broadcast.jl:1026
 [7] restart_copyto_nonleaf!(::Array{NamedTuple,1}, ::Array{NamedTuple{(:string_compressed_bytes, :string_len_bytes, :rle_bytes, :rle_len, :type, :len),Tuple{Int64,Int64,Int64,Int64,DataType,Int64}},1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(fetch),Tuple{Base.Broadcast.Extruded{Array{Any,1},Tuple{Bool},Tuple{Int64}}}}, ::NamedTuple{(:len, :type),Tuple{Int64,DataType}}, ::Int64, ::Base.OneTo{Int64}, ::Int64, ::Int64) at ./broadcast.jl:1017
 [8] copyto_nonleaf!(::Array{NamedTuple{(:string_compressed_bytes, :string_len_bytes, :rle_bytes, :rle_len, :type, :len),Tuple{Int64,Int64,Int64,Int64,DataType,Int64}},1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(fetch),Tuple{Base.Broadcast.Extruded{Array{Any,1},Tuple{Bool},Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) at ./broadcast.jl:1033
 [9] copy at ./broadcast.jl:880 [inlined]
 [10] materialize at ./broadcast.jl:837 [inlined]
 [11] savejdf(::String, ::DataFrame; verbose::Bool) at /users/yh31/.julia/packages/JDF/jDvZp/src/savejdf.jl:76
 [12] savejdf(::String, ::DataFrame) at /users/yh31/.julia/packages/JDF/jDvZp/src/savejdf.jl:48
 [13] (::var"#51#52")(::File) at ./REPL[18]:2
 [14] (::FileTrees.var"#saver#89"{var"#51#52"})(::File, ::DataFrame) at /users/yh31/.julia/packages/FileTrees/sx5xd/src/values.jl:121
 [15] (::Dagger.var"#47#48"{FileTrees.var"#saver#89"{var"#51#52"},Tuple{File,DataFrame}})() at ./threadingconstructs.jl:169
wait at ./task.jl:267 [inlined]
fetch at ./task.jl:282 [inlined]
execute!(::Dagger.ThreadProc, ::Function, ::File, ::Vararg{Any,N} where N) at /users/yh31/.julia/packages/Dagger/U857J/src/processor.jl:222
do_task(::Dagger.Context, ::Dagger.OSProc, ::Int64, ::Function, ::Tuple{File,Dagger.Chunk{Any,MemPool.DRef,Dagger.ThreadProc}}, ::Bool, ::Bool, ::Bool, ::Dagger.Sch.ThunkOptions) at /users/yh31/.julia/packages/Dagger/U857J/src/scheduler.jl:340
#137 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:354 [inlined]
run_work_thunk(::Distributed.var"#137#138"{typeof(Dagger.Sch.do_task),Tuple{Dagger.Context,Dagger.OSProc,Int64,FileTrees.var"#saver#89"{var"#51#52"},Tuple{File,Dagger.Chunk{Any,MemPool.DRef,Dagger.ThreadProc}},Bool,Bool,Bool,Dagger.Sch.ThunkOptions},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}}, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
remotecall_fetch(::Function, ::Distributed.LocalProcess, ::Dagger.Context, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:379
remotecall_fetch(::Function, ::Distributed.LocalProcess, ::Dagger.Context, ::Vararg{Any,N} where N) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:379
remotecall_fetch(::Function, ::Int64, ::Dagger.Context, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
remotecall_fetch at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421 [inlined]
macro expansion at /users/yh31/.julia/packages/Dagger/U857J/src/scheduler.jl:353 [inlined]
(::Dagger.Sch.var"#26#27"{Dagger.Context,Dagger.OSProc,Int64,FileTrees.var"#saver#89"{var"#51#52"},Tuple{File,Dagger.Chunk{Any,MemPool.DRef,Dagger.ThreadProc}},Channel{Any},Bool,Bool,Bool,Dagger.Sch.ThunkOptions})() at ./task.jl:356
Stacktrace:
 [1] compute_dag(::Dagger.Context, ::Dagger.Thunk; options::Nothing) at /users/yh31/.julia/packages/Dagger/U857J/src/scheduler.jl:137
 [2] compute(::Dagger.Context, ::Dagger.Thunk; options::Nothing) at /users/yh31/.julia/packages/Dagger/U857J/src/compute.jl:32
 [3] #compute#70 at /users/yh31/.julia/packages/Dagger/U857J/src/compute.jl:5 [inlined]
 [4] compute at /users/yh31/.julia/packages/Dagger/U857J/src/compute.jl:5 [inlined]
 [5] exec(::Dagger.Thunk) at /users/yh31/.julia/packages/FileTrees/sx5xd/src/parallelism.jl:68
 [6] save(::var"#51#52", ::FileTree; lazy::Nothing, exec::Bool) at /users/yh31/.julia/packages/FileTrees/sx5xd/src/values.jl:128
 [7] save(::Function, ::FileTree) at /users/yh31/.julia/packages/FileTrees/sx5xd/src/values.jl:111
 [8] top-level scope at REPL[18]:1
xiaodaigh commented 4 years ago
using DataFrames; b = DataFrame(a = rand([missing, nothing), 100))
JDF.save("c:/scratch/plsdel.jdf", b)

You might have Nothing and Missing in one column?

This is not supported. Can you describe your use case? I can add support if needed.

Do you think you can print the column types of all your columns? Like

println.(typeof.(eachcol(df)))
xiaodaigh commented 4 years ago

If the issue is that the column contains nothing and missing then it does not support and I don't see a good reason to support it.

If you have a good use-case for it, then please reopen and describe the use-case. Otherwise, I will close for now.

ym-han commented 4 years ago

These are the dfs I tried out. I added the file tree code just in case I had done something wrong there:

`df1 = FileTrees.get(files(test)[1])
1×11 DataFrame. Omitted printing of 11 columns │ Row │ │ │ ├────┼ │ 1 │

julia> df3 = FileTrees.get(files(test)[3])
14×11 DataFrame. Omitted printing of 11 columns │ Row │ │ │ ├────┼ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ │ 6 │ │ 7 │ │ 8 │ │ 9 │ │ 10 │ │ 11 │ │ 12 │ │ 13 │ │ 14 │

julia> df2 = FileTrees.get(files(test)[2])
8×11 DataFrame. Omitted printing of 11 columns │ Row │ │ │ ├─────┼ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ │ 6 │ │ 7 │ │ 8 │

julia> println.(typeof.(eachcol(df1)))
Array{String,1} Array{String,1} Array{String,1} Array{String,1} Array{String,1} Array{Int64,1} Array{Int64,1} Array{Float64,1} Array{String,1} Array{String,1} Array{String,1} 11-element Array{Nothing,1}: nothing nothing nothing nothing nothing nothing nothing nothing nothing nothing nothing

julia> println.(typeof.(eachcol(df2)))
Array{String,1} Array{String,1} Array{String,1} Array{String,1} Array{String,1} Array{Int64,1} Array{Int64,1} Array{Float64,1} Array{Any,1} Array{String,1} Array{String,1} 11-element Array{Nothing,1}: nothing nothing nothing nothing nothing nothing nothing nothing nothing nothing nothing

julia> println.(typeof.(eachcol(df3)))
Array{String,1} Array{String,1} Array{String,1} Array{String,1} Array{String,1} Array{Int64,1} Array{Int64,1} Array{Float64,1} Array{Array{Set{Tuple{String,String}},1},1} Array{String,1} Array{String,1} 11-element Array{Nothing,1}: nothing nothing nothing nothing nothing nothing nothing nothing nothing nothing nothing`

ym-han commented 4 years ago

OK after some testing, I'm pretty sure it's the col with the type Array{Array{Set{Tuple{String,String}},1},1} that's causing the issue.

xiaodaigh commented 4 years ago

Array{Array{Set{Tuple{String,String}},1},1}

It's not really possible for JDF to support all types and be a cross-language format. So that format won't be supported.

For now you need to convert them to a type supported by JDF. You can find a list of supported type in the front page of this repo.

I would need to work on better error messages(#49) and better instructions for dealing with these (#50).