Closed snthot closed 4 years ago
Merging #80 into master will increase coverage by
61.7%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #80 +/- ##
=========================================
+ Coverage 23.7% 85.5% +61.7%
=========================================
Files 11 6 -5
Lines 2563 221 -2342
=========================================
- Hits 609 189 -420
+ Misses 1954 32 -1922
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update ab11d57...0842885. Read the comment docs.
Thanks for checking out the package @snthot! Can you provide a brief example that demonstrates the error you saw? Prior to this fix, the test suite passed with this functionality, so I want to make sure I'm accounting for any edge cases that might be present.
oh sure, here is the case: I have a DateTime column in a DataFrame which filled with now() function. I got error when trying to convert the column to TColumn. Please see below.
julia> using DataFrames
julia> using OmniSci
julia> using Dates
julia> df1 = DataFrame(A=[1,2], B=[DateTime(2013,7,1,12,30,59), DateTime(2013,7,1,12,30,59)])
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ DateTime │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2013-07-01T12:30:59 │
│ 2 │ 2 │ 2013-07-01T12:30:59 │
julia> df2 = DataFrame(A=[1,2], B=[DateTime(2013,7,1,12,30,59), now()])
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ DateTime │
├─────┼───────┼─────────────────────────┤
│ 1 │ 1 │ 2013-07-01T12:30:59 │
│ 2 │ 2 │ 2019-10-30T16:06:57.018 │
julia> TColumn.(eachcol(df1))
2-element Array{TColumn,1}:
TColumn(OmniSci.TColumnData([1, 2], #undef, #undef, #undef), Bool[false, false])
TColumn(OmniSci.TColumnData([1372681859, 1372681859], #undef, #undef, #undef), Bool[false, false])
julia> TColumn.(eachcol(df2))
ERROR: InexactError: Int64(1.572451617018e9)
Stacktrace:
[1] Type at ./float.jl:703 [inlined]
[2] myInt64 at /home/santi/.julia/packages/OmniSci/OmRhl/src/constructors.jl:38 [inlined]
[3] _broadcast_getindex_evalf at ./broadcast.jl:578 [inlined]
[4] _broadcast_getindex at ./broadcast.jl:551 [inlined]
[5] getindex at ./broadcast.jl:511 [inlined]
[6] macro expansion at ./broadcast.jl:843 [inlined]
[7] macro expansion at ./simdloop.jl:73 [inlined]
[8] copyto! at ./broadcast.jl:842 [inlined]
[9] copyto! at ./broadcast.jl:797 [inlined]
[10] copy at ./broadcast.jl:773 [inlined]
[11] materialize at ./broadcast.jl:753 [inlined]
[12] TColumn(::Array{DateTime,1}) at /home/santi/.julia/packages/OmniSci/OmRhl/src/constructors.jl:325
[13] _broadcast_getindex_evalf at ./broadcast.jl:578 [inlined]
[14] _broadcast_getindex at ./broadcast.jl:551 [inlined]
[15] getindex at ./broadcast.jl:511 [inlined]
[16] copyto_nonleaf!(::Array{TColumn,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},Type{TColumn},Tuple{Base.Broadcast.Extruded{DataFrames.DataFrameColumns{DataFrame,AbstractArray{T,1} where T},Tuple{Bool},Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) at ./broadcast.jl:928
[17] copy at ./broadcast.jl:791 [inlined]
[18] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,Type{TColumn},Tuple{DataFrames.DataFrameColumns{DataFrame,AbstractArray{T,1} where T}}}) at ./broadcast.jl:753
[19] top-level scope at none:0
I noticed that your test pass because the DateTime data are defined down to second(s) level but the now() function normally go down to millisecond level so I got the trouble. And this prevented me from using 'load_table_binary_columnar' function. Once I fixed this problem by applying 'floor' I found that I can convert high precision datetime to TColumn (with losses) and load the data to OmniSci correctly.
I am not sure whether there is a better way to fix this. But it looks like Omnisci supports high precision timestamp so you may go with Float64 instead of Int64 ? Please advise.
@snthot Thanks for the example, yes that's an oversight on my part. Your solution fixes the timestamp at the seconds level (which is what OmniSci supported when I implemented this), but it's a good question about how to implement higher-resolution timestamps
I confirmed internally that we do want to cast to Int.
Could you make your change mydatetime2unix(x) = datetime2unix(floor(x, Dates.Second))
so that the operation here is explicit in rounding to the second? Also, if you could switch one of the values in the tests to use now()
like it caused your error, that will help to not make this mistake in the future.
Separately, I'll open up an issue about supporting high-resolution timestamps.
Thanks!
great! I absolutely agree! thanks!
I worked on Julia Version 1.2.0 (2019-08-20) with omnisci/core-os-cpu docker.
I bumped into trouble when applying TColumn on Datetime column. I think we need "floor" before converting into Int64 and it works well.
No obligation, just want to have it worked well :-)