Closed femtotrader closed 8 years ago
Constructing TArray
from DataFrame
or TimeArray
would be something like:
using DataFrames
using JuliaTS
using TimeSeries
# read as dataframe
dfOHLCV = readtable("ford_2012.csv");
dfOHLCV[:Date] = Date(dfOHLCV[:Date]);
# read as timeseries
tsOHLCV = readtimearray("ford_2012.csv");
# dataframe to TArray
ta = TArray((:Date,), [n=>dfOHLCV[n] for n in names(dfOHLCV)]...)
# timeseries to TArray
ta = TArray((:Date,), :Date=>tsOHLCV.timestamp, [symbol(n)=>tsOHLCV[n].values for n in colnames(tsOHLCV)]...)
May be Requires.jl
will help adding such conversion functions without explicit package dependencies.
@femtotrader, what do you think of an alternate interface for timeseries as in this notebook here: https://github.com/tanmaykm/notebooks/blob/master/stocks/demo2.ipynb ?
It is somewhat similar to python xarray. The backing array can be made to support NDSparseData
. Is this a more convenient way for exploring data?
The implementation is in my fork here: https://github.com/tanmaykm/AxisArrays.jl/tree/tan
Thanks for Require.jl
package suggestion. I didn't know it.
I don't feel confortable enough with JuliaTS / AxisArray so I can't help for now about API usage but I will do it when I will have a better understanding about it.
Python xarray (formerly xray) is a very interesting package and having a Julia alternative will be a great feature.
A 3D (like Panel) data structure is a great feature to have. I will use it in https://github.com/femtotrader/DataReaders.jl (to store for example OHLCV values for several stocks). https://github.com/femtotrader/TALib.jl might also be able to support this kind of structure and apply a same indicator to several stocks at once.
Maybe a function to read CSV (and XLS, XLSX) files should be add ? Because for now I don't see any other method than reading first to a DataFrame (or a TimeArray) and convert to TArray.
julia> ta = TArray((:Date,), [n=>dfOHLCV[n] for n in names(dfOHLCV)]...)
julia> ta
TArray 250x6 Tuple{Date} => Tuple{Float64,Float64,Int64,Float64,Float64}
(:Date,) => (:Close,:High,:Volume,:Low,:Open)
(2012-01-03,) => (11.13,11.25,45709900,10.99,11.0)
(2012-01-04,) => (11.3,11.53,79725200,11.07,11.15)
(2012-01-05,) => (11.59,11.63,67877500,11.24,11.33)
(2012-01-06,) => (11.71,11.8,59840700,11.52,11.74)
(2012-01-09,) => (11.8,11.95,53981500,11.7,11.83)
(2012-01-10,) => (11.8,12.05,121750600,11.63,12.0)
(2012-01-11,) => (12.07,12.18,63806000,11.65,11.74)
(2012-01-12,) => (12.14,12.18,48687700,11.89,12.16)
(2012-01-13,) => (12.04,12.08,46366700,11.84,12.01)
(2012-01-17,) => (12.02,12.26,44398400,11.96,12.2)
⋮
(2012-12-17,) => (11.39,11.41,46983300,11.14,11.16)
(2012-12-18,) => (11.67,11.68,61810400,11.4,11.48)
(2012-12-19,) => (11.73,11.85,54884700,11.62,11.79)
(2012-12-20,) => (11.77,11.8,47750100,11.58,11.74)
(2012-12-21,) => (11.86,11.86,94489300,11.47,11.55)
(2012-12-24,) => (12.4,12.4,91734900,11.67,11.67)
(2012-12-26,) => (12.79,12.79,140331900,12.31,12.31)
(2012-12-27,) => (12.76,12.81,108315100,12.36,12.79)
(2012-12-28,) => (12.87,12.88,95668600,12.52,12.55)
(2012-12-31,) => (12.95,13.08,106908900,12.76,12.88)
julia> ta = TArray((:Date,), :Date=>tsOHLCV.timestamp, [symbol(n)=>tsOHLCV[n].values for n in colnames(tsOHLCV)]...)
TArray 250x6 Tuple{Date} => Tuple{Float64,Float64,Float64,Float64,Float64}
(:Date,) => (:Close,:High,:Volume,:Low,:Open)
(2012-01-03,) => (11.13,11.25,4.57099e7,10.99,11.0)
(2012-01-04,) => (11.3,11.53,7.97252e7,11.07,11.15)
(2012-01-05,) => (11.59,11.63,6.78775e7,11.24,11.33)
(2012-01-06,) => (11.71,11.8,5.98407e7,11.52,11.74)
(2012-01-09,) => (11.8,11.95,5.39815e7,11.7,11.83)
(2012-01-10,) => (11.8,12.05,1.217506e8,11.63,12.0)
(2012-01-11,) => (12.07,12.18,6.3806e7,11.65,11.74)
(2012-01-12,) => (12.14,12.18,4.86877e7,11.89,12.16)
(2012-01-13,) => (12.04,12.08,4.63667e7,11.84,12.01)
(2012-01-17,) => (12.02,12.26,4.43984e7,11.96,12.2)
⋮
(2012-12-17,) => (11.39,11.41,4.69833e7,11.14,11.16)
(2012-12-18,) => (11.67,11.68,6.18104e7,11.4,11.48)
(2012-12-19,) => (11.73,11.85,5.48847e7,11.62,11.79)
(2012-12-20,) => (11.77,11.8,4.77501e7,11.58,11.74)
(2012-12-21,) => (11.86,11.86,9.44893e7,11.47,11.55)
(2012-12-24,) => (12.4,12.4,9.17349e7,11.67,11.67)
(2012-12-26,) => (12.79,12.79,1.403319e8,12.31,12.31)
(2012-12-27,) => (12.76,12.81,1.083151e8,12.36,12.79)
(2012-12-28,) => (12.87,12.88,9.56686e7,12.52,12.55)
(2012-12-31,) => (12.95,13.08,1.069089e8,12.76,12.88)
Column order is not preserved.
julia> names(dfOHLCV)
6-element Array{Symbol,1}:
:Date
:Open
:High
:Low
:Close
:Volume
julia> colnames(tsOHLCV)
5-element Array{UTF8String,1}:
"Open"
"High"
"Low"
"Close"
"Volume"
but
julia> ta.valnames
(:Close,:High,:Volume,:Low,:Open)
Maybe an OrderedDict
might be use
see a similar issue here https://github.com/JuliaStats/DataFrames.jl/issues/950
I also noticed that Volume
type (Int64
) is not preserved. Volume seems to be converted to Float64
when using TimeArray
(from TimeSeries.jl) to TArray
TimeArray
stores all columns in the same array. It promoted Int64
volume column to Float64
. DataFrame
can handle differently typed columns though.
I think column order is not preserved because of the use of setdiff
here: https://github.com/tanmaykm/JuliaTS.jl/blob/3595d41404c984209bfb0d11dad154f09a5a1e3a/src/ts.jl#L29. Will push a fix. Thanks for pointing it out.
Thanks @tanmaykm for your help
Hello,
it will be nice if you could tell me how to construct a
TArray
fromDataFrame
(DataFrames.jl) and fromTimeArray
(TimeSeries.jl) because I'm considering adding JuliaTS.jl support to https://github.com/femtotrader/TALib.jl/issues/6Here is some code to get a sample DataFrame Download sample data https://github.com/femtotrader/TALib.jl/blob/master/test/ford_2012.csv
and for a sample TimeArray
Maybe such constructors could be add to JuliaTS.jl (without adding these package as dependencies) ?
Kind regards