xiaodaigh / JDF.jl

Julia DataFrames serialization format
MIT License
90 stars 9 forks source link

TaskFailedException #29

Open Mirage10 opened 5 years ago

Mirage10 commented 5 years ago

hello,

it seems that JDF does not cope with massdata so well. when running the following snippet:

a = DataFrame(:a=>1:1000000000, :b=>rand(1:5,1000000000)) metadatas = savejdf("iris.jdf", a)

it raises the exception:

ERROR: TaskFailedException: ArgumentError: data > 2147483631 bytes is not supported by Blosc Stacktrace: [1] compress!(::Array{UInt8,1}, ::Ptr{Int64}, ::Int64; level::Int64, shuffle::Bool, itemsize::Int64) at /home/user/.julia/packages/Blosc/lzFr0/src/Blosc.jl:86 [2] compress! at /home/user/.julia/packages/Blosc/lzFr0/src/Blosc.jl:75 [inlined] [3] compress(::Ptr{Int64}, ::Int64; kws::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/user/.julia/packages/Blosc/lzFr0/src/Blosc.jl:103 [4] compress at /home/user/.julia/packages/Blosc/lzFr0/src/Blosc.jl:102 [inlined] [5] macro expansion at ./gcutils.jl:105 [inlined] [6] #compress#7 at /home/user/.julia/packages/Blosc/lzFr0/src/Blosc.jl:109 [inlined] [7] compress at /home/user/.julia/packages/Blosc/lzFr0/src/Blosc.jl:108 [inlined] [8] compress_then_write(::Array{Int64,1}, ::BufferedStreams.BufferedOutputStream{IOStream}) at /home/user/.julia/packages/JDF/BMdXX/src/compress_then_write.jl:13 [9] macro expansion at /home/user/.julia/packages/JDF/BMdXX/src/savejdf.jl:69 [inlined] [10] (::JDF.var"#47#50"{String,DataFrame,Symbol,Int64})() at ./threadingconstructs.jl:113

maybe it is a good idea to hand over maximal chunks of data to the blosc compressor so that the JDF is robust against massdata.

have a great day ahead!

xiaodaigh commented 5 years ago

Thanks for the report!! This is indeed a problem! I will work on a fix.

Mirage10 commented 5 years ago

super. no need for hurry. was just an academic test :-)

xiaodaigh commented 5 years ago

Thanks. I need to resolve this sooner or later as Blosc doesn't support anything larger than 2GB.