Another solution of adding bid price

Hi @neka-nat Thank-you for coming back to me.

I ran some tests on the API calling for data and I can see no real difference in performance on both implementations.

In [7]: get_data()
Out[7]:
[<'date': 2017-Oct-19 21:00:00, 'askopen': 1.31622, 'askhigh': 1.32008, 'asklow': 1.3088, 'askclose': 1.3193, 'bidopen': 1.31556, 'bidhigh': 1.32007, 'bidlow': 1.30878, 'bidclose': 1.31856, 'volume': 329395>,
 <'date': 2017-Oct-18 21:00:00, 'askopen': 1.32061, 'askhigh': 1.32287, 'asklow': 1.3135, 'askclose': 1.31622, 'bidopen': 1.32048, 'bidhigh': 1.32285, 'bidlow': 1.31258, 'bidclose': 1.31556, 'volume': 618826>,
 <'date': 2017-Oct-17 21:00:00, 'askopen': 1.31912, 'askhigh': 1.3217, 'asklow': 1.31404, 'askclose': 1.32061, 'bidopen': 1.31886, 'bidhigh': 1.32115, 'bidlow': 1.31399, 'bidclose': 1.32048, 'volume': 273625>,
 <'date': 2017-Oct-16 21:00:00, 'askopen': 1.32526, 'askhigh': 1.32876, 'asklow': 1.31548, 'askclose': 1.31912, 'bidopen': 1.32499, 'bidhigh': 1.32875, 'bidlow': 1.31547, 'bidclose': 1.31886, 'volume': 297564>,
 <'date': 2017-Oct-15 21:00:00, 'askopen': 1.33061, 'askhigh': 1.33127, 'asklow': 1.32256, 'askclose': 1.32526, 'bidopen': 1.33042, 'bidhigh': 1.33118, 'bidlow': 1.32251, 'bidclose': 1.32499, 'volume': 231480>]

In [8]: %timeit -n 10 get_data()
10 loops, best of 3: 682 ms per loop

#18

In [12]: get_data()
Out[12]:
[(<'date': 2017-Oct-19 22:00:00, 'open': 1.31622, 'high': 1.32008, 'low': 1.3088, 'close': 1.3193, 'volume': 329395>,
  <'date': 2017-Oct-19 22:00:00, 'open': 1.31556, 'high': 1.32007, 'low': 1.30878, 'close': 1.31856, 'volume': 329395>),
 (<'date': 2017-Oct-18 22:00:00, 'open': 1.32061, 'high': 1.32287, 'low': 1.3135, 'close': 1.31622, 'volume': 618826>,
  <'date': 2017-Oct-18 22:00:00, 'open': 1.32048, 'high': 1.32285, 'low': 1.31258, 'close': 1.31556, 'volume': 618826>),
 (<'date': 2017-Oct-17 22:00:00, 'open': 1.31912, 'high': 1.3217, 'low': 1.31404, 'close': 1.32061, 'volume': 273625>,
  <'date': 2017-Oct-17 22:00:00, 'open': 1.31886, 'high': 1.32115, 'low': 1.31399, 'close': 1.32048, 'volume': 273625>),
 (<'date': 2017-Oct-16 22:00:00, 'open': 1.32526, 'high': 1.32876, 'low': 1.31548, 'close': 1.31912, 'volume': 297564>,
  <'date': 2017-Oct-16 22:00:00, 'open': 1.32499, 'high': 1.32875, 'low': 1.31547, 'close': 1.31886, 'volume': 297564>),
 (<'date': 2017-Oct-15 22:00:00, 'open': 1.33061, 'high': 1.33127, 'low': 1.32256, 'close': 1.32526, 'volume': 231480>,
  <'date': 2017-Oct-15 22:00:00, 'open': 1.33042, 'high': 1.33118, 'low': 1.32251, 'close': 1.32499, 'volume': 231480>)]

In [15]: %timeit -n 10 get_data()
10 loops, best of 3: 685 ms per loop

#18 implementation would require more iterations on the python side, well at least for my way : ) I use __getinitargs__() which from testing is much faster than __getstate__() then convert into a numpy array instead of using pandas

    def _get_data(
        self, fx_session, instrument, fm_date,
        to_date, time_frame, dt
    ):
        """
        Calls FXCM for a given offer and time frame,
        collects data then returns a structured Numpy array.          
        """
        values = fx_session.get_historical_prices(
             instrument, fm_date,
             to_date, time_frame)
        return np.array(
            [v.__getinitargs__() for v in values], dtype=dt)

This will return a numpy array like so.

        arr = np.array([
            ('2017-09-13T11:05:00.000000',  1.32685, 1.32682,  1.32704,   1.32686,  1.32684,  1.32702,  1.32679,  1.32683, 222),
            ('2017-09-13T11:04:00.000000',  1.32683,  1.32686,  1.32682,  1.32685,  1.32682,  1.32684,  1.3268 ,  1.32684,  97),
            ('2017-09-13T11:03:00.000000',  1.32664,  1.32684,  1.32663,  1.32683,  1.32664,  1.32683,  1.32661,  1.32682, 268),
            ('2017-09-13T11:02:00.000000',  1.3268 ,  1.32692,  1.3266 ,  1.32664,  1.32678,  1.32689,  1.32658,  1.32664, 299),
            ('2017-09-13T11:01:00.000000',  1.32648,  1.32682,  1.32648,  1.3268 ,  1.32647,  1.32682,  1.32647,  1.32678, 322),
            ('2017-09-13T11:00:00.000000',  1.32647,  1.32649,  1.32628,  1.32648,  1.32644,  1.32651,  1.32626,  1.32647, 285)],
          dtype=[('date', '<M8[us]'), ('askopen', '<f8'), ('askhigh', '<f8'), ('asklow', '<f8'), ('askclose', '<f8'),
                 ('bidopen', '<f8'), ('bidhigh', '<f8'), ('bidlow', '<f8'), ('bidclose', '<f8'), ('volume', '<i8')])

Then the data can be manipulated about 17-20 times faster than pandas.

    def _integrity_check(self, a):
        """
        Basic data integrity checking of all FXCM price data
        """
        a = a[a['askhigh'] >= a['asklow']]
        a = a[a['askhigh'] >= a['askopen']]
        a = a[a['asklow'] <= a['askopen']]
        a = a[a['askhigh'] >= a['askclose']]
        a = a[a['asklow'] <= a['askclose']]
        a = a[a['bidhigh'] >= a['bidlow']]
        a = a[a['bidhigh'] >= a['bidopen']]
        a = a[a['bidlow'] <= a['bidopen']]
        a = a[a['bidhigh'] >= a['bidclose']]
        a = a[a['bidlow'] <= a['bidclose']]
        a = a[a['volume'] >= 0]
        idx = np.unique(a['date'][::-1], return_index = True)[1]
        return a[::-1][idx][::-1]

Best Regards

James

neka-nat / python-forexconnect

Another solution of adding bid price #18