queryverse / FeatherLib.jl

Low level Julia library for reading feather files
Other
4 stars 3 forks source link

what do we do about `Int32` overflows? #2

Closed ExpandingMan closed 6 years ago

ExpandingMan commented 6 years ago

This is something I'm still worried about, recall this issue on Feather.jl. No, I haven't had any communication with the Arrow community yet.

Confusingly Feather seems to violate the Arrow format in several places. From what I could gather, it seems almost like Feather files really do have arrays with length greater than typemax(Int32), but they use the ability of the C++ arrow package to pull data in chunks to get around that somehow.

Note that I have "silently" changed all of the array length values in Arrow.jl to be Int rather than Int32. This would make it very easy for us to cheat in some cases, but of course offsets are still Int32.

So I don't know. This is a major concern, but I haven't really had time to reach out to the arrow community on this yet.

See related issues here and here.

davidanthoff commented 6 years ago

I've just pinged Wes about this on the Arrow.jl issue, maybe he can chime in and help us.