tanmaykm / JuliaTS.jl

Other
1 stars 2 forks source link

TArray constructor from DataFrame (DataFrames.jl) and from TimeArray (TimeSeries.jl) #4

Closed femtotrader closed 8 years ago

femtotrader commented 8 years ago

Hello,

it will be nice if you could tell me how to construct a TArray from DataFrame (DataFrames.jl) and from TimeArray (TimeSeries.jl) because I'm considering adding JuliaTS.jl support to https://github.com/femtotrader/TALib.jl/issues/6

Here is some code to get a sample DataFrame Download sample data https://github.com/femtotrader/TALib.jl/blob/master/test/ford_2012.csv

using DataFrames
filename = "test/ford_2012.csv"
dfOHLCV = readtable(filename)
dfOHLCV[:Date] = Date(dfOHLCV[:Date])

and for a sample TimeArray

using TimeSeries
taOHLCV = readtimearray(filename)

Maybe such constructors could be add to JuliaTS.jl (without adding these package as dependencies) ?

Kind regards

tanmaykm commented 8 years ago

Constructing TArray from DataFrame or TimeArray would be something like:

using DataFrames
using JuliaTS
using TimeSeries

# read as dataframe
dfOHLCV = readtable("ford_2012.csv");
dfOHLCV[:Date] = Date(dfOHLCV[:Date]);

# read as timeseries
tsOHLCV = readtimearray("ford_2012.csv");

# dataframe to TArray
ta = TArray((:Date,), [n=>dfOHLCV[n] for n in names(dfOHLCV)]...)

# timeseries to TArray
ta = TArray((:Date,), :Date=>tsOHLCV.timestamp, [symbol(n)=>tsOHLCV[n].values for n in colnames(tsOHLCV)]...)

May be Requires.jl will help adding such conversion functions without explicit package dependencies.

tanmaykm commented 8 years ago

@femtotrader, what do you think of an alternate interface for timeseries as in this notebook here: https://github.com/tanmaykm/notebooks/blob/master/stocks/demo2.ipynb ?

It is somewhat similar to python xarray. The backing array can be made to support NDSparseData. Is this a more convenient way for exploring data?

The implementation is in my fork here: https://github.com/tanmaykm/AxisArrays.jl/tree/tan

femtotrader commented 8 years ago

Thanks for Require.jl package suggestion. I didn't know it.

I don't feel confortable enough with JuliaTS / AxisArray so I can't help for now about API usage but I will do it when I will have a better understanding about it.

Python xarray (formerly xray) is a very interesting package and having a Julia alternative will be a great feature.

A 3D (like Panel) data structure is a great feature to have. I will use it in https://github.com/femtotrader/DataReaders.jl (to store for example OHLCV values for several stocks). https://github.com/femtotrader/TALib.jl might also be able to support this kind of structure and apply a same indicator to several stocks at once.

Maybe a function to read CSV (and XLS, XLSX) files should be add ? Because for now I don't see any other method than reading first to a DataFrame (or a TimeArray) and convert to TArray.

femtotrader commented 8 years ago
julia> ta = TArray((:Date,), [n=>dfOHLCV[n] for n in names(dfOHLCV)]...)
julia> ta
TArray 250x6 Tuple{Date} => Tuple{Float64,Float64,Int64,Float64,Float64}
 (:Date,) => (:Close,:High,:Volume,:Low,:Open)
 (2012-01-03,) => (11.13,11.25,45709900,10.99,11.0)
 (2012-01-04,) => (11.3,11.53,79725200,11.07,11.15)
 (2012-01-05,) => (11.59,11.63,67877500,11.24,11.33)
 (2012-01-06,) => (11.71,11.8,59840700,11.52,11.74)
 (2012-01-09,) => (11.8,11.95,53981500,11.7,11.83)
 (2012-01-10,) => (11.8,12.05,121750600,11.63,12.0)
 (2012-01-11,) => (12.07,12.18,63806000,11.65,11.74)
 (2012-01-12,) => (12.14,12.18,48687700,11.89,12.16)
 (2012-01-13,) => (12.04,12.08,46366700,11.84,12.01)
 (2012-01-17,) => (12.02,12.26,44398400,11.96,12.2)
 ⋮
 (2012-12-17,) => (11.39,11.41,46983300,11.14,11.16)
 (2012-12-18,) => (11.67,11.68,61810400,11.4,11.48)
 (2012-12-19,) => (11.73,11.85,54884700,11.62,11.79)
 (2012-12-20,) => (11.77,11.8,47750100,11.58,11.74)
 (2012-12-21,) => (11.86,11.86,94489300,11.47,11.55)
 (2012-12-24,) => (12.4,12.4,91734900,11.67,11.67)
 (2012-12-26,) => (12.79,12.79,140331900,12.31,12.31)
 (2012-12-27,) => (12.76,12.81,108315100,12.36,12.79)
 (2012-12-28,) => (12.87,12.88,95668600,12.52,12.55)
 (2012-12-31,) => (12.95,13.08,106908900,12.76,12.88)

julia> ta = TArray((:Date,), :Date=>tsOHLCV.timestamp, [symbol(n)=>tsOHLCV[n].values for n in colnames(tsOHLCV)]...)
TArray 250x6 Tuple{Date} => Tuple{Float64,Float64,Float64,Float64,Float64}
 (:Date,) => (:Close,:High,:Volume,:Low,:Open)
 (2012-01-03,) => (11.13,11.25,4.57099e7,10.99,11.0)
 (2012-01-04,) => (11.3,11.53,7.97252e7,11.07,11.15)
 (2012-01-05,) => (11.59,11.63,6.78775e7,11.24,11.33)
 (2012-01-06,) => (11.71,11.8,5.98407e7,11.52,11.74)
 (2012-01-09,) => (11.8,11.95,5.39815e7,11.7,11.83)
 (2012-01-10,) => (11.8,12.05,1.217506e8,11.63,12.0)
 (2012-01-11,) => (12.07,12.18,6.3806e7,11.65,11.74)
 (2012-01-12,) => (12.14,12.18,4.86877e7,11.89,12.16)
 (2012-01-13,) => (12.04,12.08,4.63667e7,11.84,12.01)
 (2012-01-17,) => (12.02,12.26,4.43984e7,11.96,12.2)
 ⋮
 (2012-12-17,) => (11.39,11.41,4.69833e7,11.14,11.16)
 (2012-12-18,) => (11.67,11.68,6.18104e7,11.4,11.48)
 (2012-12-19,) => (11.73,11.85,5.48847e7,11.62,11.79)
 (2012-12-20,) => (11.77,11.8,4.77501e7,11.58,11.74)
 (2012-12-21,) => (11.86,11.86,9.44893e7,11.47,11.55)
 (2012-12-24,) => (12.4,12.4,9.17349e7,11.67,11.67)
 (2012-12-26,) => (12.79,12.79,1.403319e8,12.31,12.31)
 (2012-12-27,) => (12.76,12.81,1.083151e8,12.36,12.79)
 (2012-12-28,) => (12.87,12.88,9.56686e7,12.52,12.55)
 (2012-12-31,) => (12.95,13.08,1.069089e8,12.76,12.88)

Column order is not preserved.

julia> names(dfOHLCV)
6-element Array{Symbol,1}:
 :Date
 :Open
 :High
 :Low
 :Close
 :Volume

julia> colnames(tsOHLCV)
5-element Array{UTF8String,1}:
 "Open"
 "High"
 "Low"
 "Close"
 "Volume"

but

julia> ta.valnames
(:Close,:High,:Volume,:Low,:Open)

Maybe an OrderedDict might be use see a similar issue here https://github.com/JuliaStats/DataFrames.jl/issues/950

I also noticed that Volume type (Int64) is not preserved. Volume seems to be converted to Float64 when using TimeArray (from TimeSeries.jl) to TArray

tanmaykm commented 8 years ago

TimeArray stores all columns in the same array. It promoted Int64 volume column to Float64. DataFrame can handle differently typed columns though.

I think column order is not preserved because of the use of setdiff here: https://github.com/tanmaykm/JuliaTS.jl/blob/3595d41404c984209bfb0d11dad154f09a5a1e3a/src/ts.jl#L29. Will push a fix. Thanks for pointing it out.

femtotrader commented 8 years ago

Thanks @tanmaykm for your help