queryverse / IterableTables.jl

Implementations of the TableTraits.jl interface for various packages
Other
79 stars 9 forks source link

Convert DataFrames.DataFrame <-> TimeSeries.TimeArray #46

Closed femtotrader closed 7 years ago

femtotrader commented 7 years ago

Hello,

I'm looking for a way to convert DataFrames.DataFrame to (and from) TimeSeries.TimeArray using IterableTables.jl. but it doesn't seems to work as I was expecting

TimeSeries.TimeArray -> DataFrames.DataFrame

Create TimeSeries.TimeArray

julia> using TimeSeries
julia> dat = "Date,Open,High,Low,Close,Volume
2012-01-03,11.0,11.25,10.99,11.13,45709900
2012-01-04,11.15,11.53,11.07,11.3,79725200
2012-01-05,11.33,11.63,11.24,11.59,67877500
2012-01-06,11.74,11.8,11.52,11.71,59840700
2012-01-09,11.83,11.95,11.7,11.8,53981500
2012-01-10,12.0,12.05,11.63,11.8,121750600
2012-01-11,11.74,12.18,11.65,12.07,63806000"
julia> io2 = IOBuffer(dat)
julia> ta2 = readtimearray(io2)

Convert to DataFrame

julia> using IterableTables
julia> using DataFrames
julia> DataFrame(ta2)
7×6 DataFrames.DataFrame
│ Row │ timestamp  │ Open  │ High  │ Low   │ Close │ Volume    │
├─────┼────────────┼───────┼───────┼───────┼───────┼───────────┤
│ 1   │ 2012-01-03 │ 11.0  │ 11.25 │ 10.99 │ 11.13 │ 4.57099e7 │
│ 2   │ 2012-01-04 │ 11.15 │ 11.53 │ 11.07 │ 11.3  │ 7.97252e7 │
│ 3   │ 2012-01-05 │ 11.33 │ 11.63 │ 11.24 │ 11.59 │ 6.78775e7 │
│ 4   │ 2012-01-06 │ 11.74 │ 11.8  │ 11.52 │ 11.71 │ 5.98407e7 │
│ 5   │ 2012-01-09 │ 11.83 │ 11.95 │ 11.7  │ 11.8  │ 5.39815e7 │
│ 6   │ 2012-01-10 │ 12.0  │ 12.05 │ 11.63 │ 11.8  │ 1.21751e8 │
│ 7   │ 2012-01-11 │ 11.74 │ 12.18 │ 11.65 │ 12.07 │ 6.3806e7  │

works fine but...

DataFrames.DataFrame -> TimeSeries.TimeArray

Create DataFrames.DataFrame

julia> using DataFrames
julia> dat = "Date,Open,High,Low,Close,Volume
2012-01-03,11.0,11.25,10.99,11.13,45709900
2012-01-04,11.15,11.53,11.07,11.3,79725200
2012-01-05,11.33,11.63,11.24,11.59,67877500
2012-01-06,11.74,11.8,11.52,11.71,59840700
2012-01-09,11.83,11.95,11.7,11.8,53981500
2012-01-10,12.0,12.05,11.63,11.8,121750600
2012-01-11,11.74,12.18,11.65,12.07,63806000"
julia> io = IOBuffer(dat)
julia> df = readtable(io)
julia> df[:Date] = Date(df[:Date])

Convert to TimeSeries.TimeArray

julia> using IterableTables
julia> using TimeSeries
julia> TimeArray(df[[:Date, :Open, :High, :Low, :Close]], timestamp_column=:Date)
ERROR: MethodError: no method matching TimeSeries.TimeArray(::DataFrames.DataFrame; timestamp_column=:Date)

Any idea?

Kind regards

PS: Following https://github.com/JuliaStats/TimeSeries.jl/issues/290 and https://github.com/femtotrader/TimeSeriesIO.jl/issues/19

davidanthoff commented 7 years ago

Ah, this is a bug! Should be fixed in #47 and I'll tag a new release asap. Thanks for reporting!

femtotrader commented 7 years ago

Thanks @davidanthoff
You can add theses examples to your test bases to catch this kind of issue.

davidanthoff commented 7 years ago

You can add theses examples to your test bases to catch this kind of issue.

Yes, that would be very useful: now tracked in #48.

femtotrader commented 7 years ago

That's why I opened https://github.com/JuliaStats/TimeSeries.jl/issues/297 it's very convenient for tests. All Julia tabular data structures with CSV like input (DataFrame, TimeArray...) should accept IOBuffer (not only filename)

davidanthoff commented 7 years ago

All Julia tabular data structure with CSV like input (DataFrame, TimeArray...) should accept IOBuffer (not only filename)

There is probably a way to achieve this via the new CSVFiles.jl stuff. Essentially the idea would be that it would be enough to accept an iterable table in a constructor and use the stuff from CSVFiles.jl to convert a stream in CSV format into an iterable table. I'm tracking this in davidanthoff/CSVFiles.jl#5.

davidanthoff commented 7 years ago

Once davidanthoff/CSVFiles.jl#6 lands, one should be able to do this:

using FileIO, CSVFiles, DataFrame

dat = """Date,Open,High,Low,Close,Volume
2012-01-03,11.0,11.25,10.99,11.13,45709900
2012-01-04,11.15,11.53,11.07,11.3,79725200
2012-01-05,11.33,11.63,11.24,11.59,67877500
2012-01-06,11.74,11.8,11.52,11.71,59840700
2012-01-09,11.83,11.95,11.7,11.8,53981500
2012-01-10,12.0,12.05,11.63,11.8,121750600
2012-01-11,11.74,12.18,11.65,12.07,63806000"""

io = IOBuffer(dat)

df = DataFrame(load(Stream(format"CSV", io)))
# Or using pipe syntax, once I merge one more thing
df = Stream(format"CSV", io) |> load() |> DataFrame

You should be able to substitute any of the IterableTables.jl sinks in for DataFrame and it should all work.

davidanthoff commented 7 years ago

Alright, bug fix is released and tagged, so a Pkg.update() should be enough to make this work. Thanks again for reporting this.

femtotrader commented 7 years ago

All right!

julia> TimeArray(df[[:Date, :Open, :High, :Low, :Close]], timestamp_column=:Date)
(WARNINGS)
7x4 TimeSeries.TimeArray{Float64,2,Date,Array{Float64,2}} 2012-01-03 to 2012-01-11

             Open     High     Low      Close
2012-01-03 | 11.0     11.25    10.99    11.13
2012-01-04 | 11.15    11.53    11.07    11.3
2012-01-05 | 11.33    11.63    11.24    11.59
2012-01-06 | 11.74    11.8     11.52    11.71
2012-01-09 | 11.83    11.95    11.7     11.8
2012-01-10 | 12.0     12.05    11.63    11.8
2012-01-11 | 11.74    12.18    11.65    12.07

Thanks @davidanthoff