quandl / quandl-python

MIT License
1.38k stars 338 forks source link

Return 3D datastructure when a list of several symbols is given #74

Closed femtotrader closed 8 years ago

femtotrader commented 8 years ago

Hello,

import Quandl
Quandl.get(["CHRIS/CME_ES1", "CHRIS/CME_NQ1"])

returns a DataFrame

Isn't there a way to get a Pandas Panel or an xarray.DataArray

An other approach could be to simply return an ordered dict with DataFrame as "value" and ticker as key.

User will be responsible for converting to 3D datastructure.

Kind regards

femtotrader commented 8 years ago

An example with Pandas DataReader

In [1]: import pandas_datareader as pdr
In [2]: panel = pdr.DataReader(["GOOG", "IBM"], "yahoo")
In [3]: panel
Out[3]:
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 1630 (major_axis) x 2 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2010-01-04 00:00:00 to 2016-06-23 00:00:00
Minor_axis axis: GOOG to IBM

In [6]: import pandas as pd

In [7]: pd.set_option("max_rows", 10)

In [8]: panel.loc["Close"]
Out[8]:
                  GOOG         IBM
Date
2010-01-04  626.751061  132.449997
2010-01-05  623.991055  130.850006
2010-01-06  608.261023  130.000000
2010-01-07  594.101005  129.550003
2010-01-08  602.021036  130.850006
...                ...         ...
2016-06-17  691.719971  151.990005
2016-06-20  693.710022  153.610001
2016-06-21  695.940002  154.050003
2016-06-22  697.460022  152.919998
2016-06-23  701.869995  155.350006

[1630 rows x 2 columns]
In [11]: panel.loc[:,:,"GOOG"]
Out[11]:
                  Open        High         Low       Close      Volume  \
Date
2010-01-04  626.951088  629.511067  624.241073  626.751061   3927000.0
2010-01-05  627.181073  627.841071  621.541045  623.991055   6031900.0
2010-01-06  625.861078  625.861078  606.361042  608.261023   7987100.0
2010-01-07  609.401025  610.001045  592.651008  594.101005  12876600.0
2010-01-08  592.000997  603.251034  589.110988  602.021036   9483900.0
...                ...         ...         ...         ...         ...
2016-06-17  708.650024  708.820007  688.452026  691.719971   3396800.0
2016-06-20  698.770020  702.479980  693.409973  693.710022   2071400.0
2016-06-21  698.400024  702.770020  692.010010  695.940002   1464300.0
2016-06-22  699.059998  700.859985  693.081970  697.460022   1179300.0
2016-06-23  697.450012  701.950012  687.000000  701.869995   2157400.0

             Adj Close
Date
2010-01-04  313.062468
2010-01-05  311.683844
2010-01-06  303.826685
2010-01-07  296.753749
2010-01-08  300.709808
...                ...
2016-06-17  691.719971
2016-06-20  693.710022
2016-06-21  695.940002
2016-06-22  697.460022
2016-06-23  701.869995

[1630 rows x 6 columns]
femtotrader commented 8 years ago

Here is a quick idea :


import six
import collections
import Quandl

def myget(datasets, **kwargs):
    if isinstance(datasets, six.string_types):
        return Quandl.get(datasets, **kwargs)
    else:
        d = collections.OrderedDict()
        for dataset in datasets:
            d[dataset] = Quandl.get(dataset, **kwargs)
            columns = list(d[dataset].columns)
        return d  # returns OrderedDict of DataFrame

In [20]: d = myget(["CHRIS/CME_ES1", "CHRIS/CME_NQ1"])
In [21]: panel = pd.Panel.from_dict(d).transpose(2,1,0)
In [22]: panel
Out[22]:
<class 'pandas.core.panel.Panel'>
Dimensions: 8 (items) x 4813 (major_axis) x 2 (minor_axis)
Items axis: Open to Open Interest
Major_axis axis: 1997-09-09 00:00:00 to 2016-06-23 00:00:00
Minor_axis axis: CHRIS/CME_ES1 to CHRIS/CME_NQ1
In [23]: panel["Open"]
Out[23]:
            CHRIS/CME_ES1  CHRIS/CME_NQ1
Date
1997-09-09         934.00            NaN
1997-09-10         934.00            NaN
1997-09-11         916.00            NaN
1997-09-12         908.00            NaN
1997-09-15         925.00            NaN
...                   ...            ...
2016-06-17        2077.75        4423.25
2016-06-20        2071.00        4383.00
2016-06-21        2079.00        4398.50
2016-06-22        2079.25        4397.25
2016-06-23        2083.00        4409.75

[4813 rows x 2 columns]

In [24]: panel[:,:,"CHRIS/CME_ES1"]
Out[24]:
               Open     High      Low     Last  Change   Settle     Volume  \
Date
1997-09-09   934.00   942.00   933.00   934.00     NaN   934.00     7034.0
1997-09-10   934.00   935.00   915.00   915.00     NaN   915.00    11387.0
1997-09-11   916.00   918.00   900.00   908.00     NaN   908.00     2523.0
1997-09-12   908.00   926.00   904.00   924.00     NaN   924.00      928.0
1997-09-15   925.00   930.00   920.00   922.00     NaN   922.00      208.0
...             ...      ...      ...      ...     ...      ...        ...
2016-06-17  2077.75  2083.00  2073.75  2078.50     NaN  2079.12    52547.0
2016-06-20  2071.00  2092.50  2070.25  2082.75   15.25  2074.25  1592550.0
2016-06-21  2079.00  2086.00  2074.00  2079.25    6.25  2080.50  1394626.0
2016-06-22  2079.25  2091.25  2075.25  2077.00    3.75  2076.75  1402829.0
2016-06-23  2083.00  2113.25  2083.00  2113.25   29.00  2105.75  1308321.0

            Open Interest
Date
1997-09-09         1109.0
1997-09-10         2325.0
1997-09-11         2549.0
1997-09-12         2163.0
1997-09-15         2107.0
...                   ...
2016-06-17       607345.0
2016-06-20      2774441.0
2016-06-21      2781532.0
2016-06-22      2769327.0
2016-06-23      2779028.0

[4813 rows x 8 columns]

In [30]: ds = xarray.Dataset(d).to_array()
In [31]: ds = ds.rename({'variable': 'Dataset'})
In [32]: 
Out[31]:
<xarray.DataArray (Dataset: 2, Date: 4813, dim_1: 8)>
array([[[  9.34000000e+02,   9.42000000e+02,   9.33000000e+02, ...,
           9.34000000e+02,   7.03400000e+03,   1.10900000e+03],
        [  9.34000000e+02,   9.35000000e+02,   9.15000000e+02, ...,
           9.15000000e+02,   1.13870000e+04,   2.32500000e+03],
        [  9.16000000e+02,   9.18000000e+02,   9.00000000e+02, ...,
           9.08000000e+02,   2.52300000e+03,   2.54900000e+03],
        ...,
        [  2.07900000e+03,   2.08600000e+03,   2.07400000e+03, ...,
           2.08050000e+03,   1.39462600e+06,   2.78153200e+06],
        [  2.07925000e+03,   2.09125000e+03,   2.07525000e+03, ...,
           2.07675000e+03,   1.40282900e+06,   2.76932700e+06],
        [  2.08300000e+03,   2.11325000e+03,   2.08300000e+03, ...,
           2.10575000e+03,   1.30832100e+06,   2.77902800e+06]],

       [[             nan,              nan,              nan, ...,
                      nan,              nan,              nan],
        [             nan,              nan,              nan, ...,
                      nan,              nan,              nan],
        [             nan,              nan,              nan, ...,
                      nan,              nan,              nan],
        ...,
        [  4.39850000e+03,   4.41575000e+03,   4.38825000e+03, ...,
           4.40025000e+03,   1.93001000e+05,   2.21390000e+05],
        [  4.39725000e+03,   4.43400000e+03,   4.39100000e+03, ...,
           4.39450000e+03,   1.88817000e+05,   2.18161000e+05],
        [  4.40975000e+03,   4.47925000e+03,   4.40800000e+03, ...,
           4.46250000e+03,   1.94779000e+05,   2.13504000e+05]]])
Coordinates:
  * Date     (Date) datetime64[ns] 1997-09-09 1997-09-10 1997-09-11 ...
  * dim_1    (dim_1) object 'Open' 'High' 'Low' 'Last' 'Change' 'Settle' ...
  * Dataset  (Dataset) <U13 'CHRIS/CME_ES1' 'CHRIS/CME_NQ1'
junos commented 8 years ago

Hi @femtotrader,

Thanks for your suggestions. This is an interesting feature suggestion. This is the first time we have seen this type of feature request. For now we will tag it on the next major milestone and decide whether there is a high demand for this type of feature when planning our next major release.