khrapovs / vix

Compute VIX and related volatility indices
98 stars 34 forks source link

KeyError while running replicate vix code #1

Open mwahal opened 8 years ago

mwahal commented 8 years ago

Hi

I am getting KeyError while running the replicate vix code. I have created both yield.csv and options.csv from the source file. But when I run the application, I get KeyError for 'P' ( I guess Puts).

Thanks Mudit

Traceback (most recent call last): File "../reproduce_vix.py", line 424, in vixvalue = whitepaper() File "../reproduce_vix.py", line 387, in whitepaper options2 = put_call_parity(options2) File "../reproduce_vix.py", line 175, in put_call_parity options['CPdiff'] = (options['C'] - options['P']).abs() File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in getitem return self._getitem_column(key) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column return self._get_item_cache(key) File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache values = self._data.get(item) File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get loc = self.items.get_loc(item) File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc return self._engine.get_loc(key) File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979) File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843) File "hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11463) File "hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11416) KeyError: 'P'

mwahal commented 8 years ago

Here is some debugging I did, looks like you are ignoring all the lines with 0 Bids which is removing all the PUT prices from the list.

/home/xxyyzz/stk/db/cboe/reproduce_vix.py(165)bid_ask_average() -> options = options[options['Bid'] > 0]['Premium'].unstack('CP') (Pdb) p options Bid Ask Premium Date Days CP Strike 2009-01-01 9 C 200 717.6 722.80 720.200 250 667.6 672.90 670.250 300 617.9 622.90 620.400 350 567.9 572.90 570.400 P 200 0.0 0.05 0.025 250 0.0 0.05 0.025 300 0.0 0.05 0.025 350 0.0 0.05 0.025 (Pdb) n /home/xxyyzz/stk/db/cboe/reproduce_vix.py(166)bid_ask_average() -> return options (Pdb) p options _CP C Date Days Strike 2009-01-01 9 200 720.20 250 670.25 300 620.40 350 570.40 (Pdb) n --Return-- /home/xxyyzz/stk/db/cboe/reproduce_vix.py(166)bid_ask_average()->CP ... 570.40 -> return options (Pdb) n /home/xxyyzz/stk/db/cboe/reproduce_vix.py(387)whitepaper() -> options2 = put_call_parity(options2) (Pdb) s --Call-- /home/xxyyzz/stk/db/cboe/reproduce_vix.py(169)put_call_parity() -> def put_call_parity(options): (Pdb) n /home/xxyyzz/stk/db/cboe/reproduce_vix.py(175)put_call_parity() -> options['CPdiff'] = (options['C'] - options['P']).abs() (Pdb) n _KeyError: KeyError('P',)

mwahal commented 8 years ago

There are more problems as you go forward after fixing the bid size to non zero for PUTS. I am not sure if this code has been tested or something got changed in the python packages in last 2 years since it was written.

mwahal commented 8 years ago

This is the last error after fixing the non zero PUTs

Traceback (most recent call last): File "../reproduce_vix.py", line 424, in vixvalue = whitepaper() File "../reproduce_vix.py", line 414, in whitepaper two_sigmas = interpolate_vol(sigma2) File "../reproduce_vix.py", line 348, in interpolate_vol two_sigmas = sigma2.reset_index().groupby('Date').apply(near_next_term).groupby(level = 'Date').first() File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 715, in apply return self._python_apply_general(f) File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 719, in _python_apply_general self.axis) File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1406, in apply res = f(group) File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 711, in f return func(g, _args, _kwargs) File "../reproduce_vix.py", line 336, in near_next_term T2 = days[days > T1].min() File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 29, in _amin return umr_minimum(a, axis, None, out, keepdims) ValueError: zero-size array to reduction operation minimum which has no identity *

khrapovs commented 8 years ago

Dear @mwahal. Sorry for the late response. I have just run the code. It works perfectly well. But, I use Python 3.4. You are on Python 2.7. Unfortunately, it is not my priority at this moment to maintain this code for Python 2.7. Although I am sure there should be only a few minor tweaks to make it work. Btw, I just pushed a few cosmetic changes. Not relevant to your problem, but still...

mwahal commented 8 years ago

Thanks .. let me try it on Python 3.4. How about the data files etc ? Are those still the same as shown in the sample code ?

Thanks Mudit

mwahal commented 8 years ago

I downloaded your latest py script, and I am running it under python3.4 now, and still facing issues. May be something to do with coexistance for 2.7 and 3.4 ?

Traceback (most recent call last): File "reproduce_vix.py", line 442, in vixvalue = whitepaper() File "reproduce_vix.py", line 432, in whitepaper two_sigmas = interpolate_vol(sigma2) File "reproduce_vix.py", line 365, in interpolate_vol two_sigmas = grouped.apply(near_next_term).groupby(level='Date').first() File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 640, in apply return self._python_apply_general(f) File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 644, in _python_apply_general self.axis) File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 1524, in apply res = f(group) File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 636, in f return func(g, _args, *_kwargs) File "reproduce_vix.py", line 351, in near_next_term T2 = days[days > T1].min() File "/usr/local/lib/python3.4/dist-packages/numpy/core/_methods.py", line 29, in _amin return umr_minimum(a, axis, None, out, keepdims) ValueError: zero-size array to reduction operation minimum which has no identity

mwahal commented 8 years ago

My data files

Expiration,Days,Strike,Call Bid,Call Ask,Put Bid,Put Ask 20090110,9,200,717.6,722.8,0.10,0.15 20090110,9,250,667.6,672.9,0.10,0.15 20090110,9,300,617.9,622.9,0.10,0.15 20090110,9,350,567.9,572.9,0.10,0.15

Date,Days,Rate 20090101,9,0.38 20090101,37,0.38

khrapovs commented 8 years ago

Are you using exactly the data that you printed in your last comment? If so, then these two data sets do not have common dates and the merge will produce an empty dataframe. Try the code with the data in 'data' directory.

mwahal commented 8 years ago

I am not sure if the data directory is on the github. I was using data from the example in the source.

khrapovs commented 8 years ago

Again, my apologies. The data (csv files) were excluded from git commits. I pushed the data to the repo just now. Sorry...

mwahal commented 8 years ago

Thanks for uploading the data. Here is my result, please let know if its correct.

python3 reproduce_vix.py VIX Date 2009-01-01 61.217999

mwahal commented 8 years ago

On a side note, your options.csv has 9 day and 37 day expiration. But as per vix paper, the near term and next term should be between 23 and 37 days.

http://www.cboe.com/micro/vix/vixwhite.pdf

The VIX calculation measures 30-day expected volatility of the S&P 500 Index. The components of the VIX calculation are near- and next-term put and call options with more than 23 days and less than 37 days to expiration.

mwahal commented 8 years ago

Using the SPX index options for May 20 and May 27 which are 24 and 31 day expiration respectively and for yield using 0.01, I am getting VIX value as 13.94 which is very very very close to 13.96 as shown by exchange.

mwahal commented 8 years ago

Hi .. sorry to bother you again but I think I may have found one issue today. I am trying to compute VIX today using SPX options. I am using May 20 and May 27 options. The May 27th option has EXACTLY 30 days to expire, so technically its both near term and next term. In the data.csv file, I have both May 20 and May 27 options. The code fails in function near_next_term since its trying to find option expiring more than the 30 days but there is none. I cant change "days > T1" to "days >= T1" since it returns a NaN for VIX value.

[EDIT: The fix is to use May 27 and June 3 options, then the code will not fail, though the VIX value is off by somewhat]

python3 reproduce_vix.py
Traceback (most recent call last):
  File "reproduce_vix.py", line 444, in <module>
    vixvalue = whitepaper()
  File "reproduce_vix.py", line 434, in whitepaper
    two_sigmas = interpolate_vol(sigma2)
  File "reproduce_vix.py", line 365, in interpolate_vol
    two_sigmas = grouped.apply(near_next_term).groupby(level='Date').first()
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 640, in apply
    return self._python_apply_general(f)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 644, in _python_apply_general
    self.axis)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 1524, in apply
    res = f(group)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 636, in f
    return func(g, *args, **kwargs)
  File "reproduce_vix.py", line 351, in near_next_term
    T2 = days[days > T1].min()
  File "/usr/local/lib/python3.4/dist-packages/numpy/core/_methods.py", line 29, in _amin
    return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity
mwahal commented 8 years ago

Another issue is that when the Abs(CPDiff) of the same expiration is same for more than ONE C/P pair. Here is what I encountered. Now for days=79 there are 2 entries with min=1, which causes an exception in panda later when assigning sigma2['Forward'] = forward. I am sorry, I dont know enough panda, just gooling and using the pdb.

-> df = pd.merge(df, yields.reset_index(), how = 'left') (Pdb) p df CP Date Days Strike C P CPdiff min 0 2016-04-27 79 2080 55.95 50.95 5.00 1 1 2016-04-27 79 2090 49.70 54.70 5.00 1 2 2016-04-27 114 2080 67.70 66.45 1.25 1

(Pdb) p df CP Date Days Strike C P CPdiff min Rate Forward 0 2016-04-27 79 2080 55.95 50.95 5.00 1 0.01 2085.000108 1 2016-04-27 79 2090 49.70 54.70 5.00 1 0.01 2095.000108 2 2016-04-27 114 2080 67.70 66.45 1.25 1 0.01 2081.250039

(Pdb) p forward CP Forward Date Days 2016-04-27 79 2085.000108 79 2095.000108 114 2081.250039

(Pdb) p mid_strike Mid Strike Date Days 2016-04-27 79 2090 114 2080 ==== later on ===

sigma2['Forward'] = forward (Pdb) p sigma2 CP sigma2 Mid Strike Date Days 2016-04-27 79 0.005799 2090 114 0.009654 2080

(Pdb) p forward CP Forward Date Days 2016-04-27 79 2085.000108 79 2095.000108 114 2081.250039 (Pdb) n Exception: Exceptio...index!',)

Traceback (most recent call last): File "reproduce_vix.py", line 474, in vixvalue = whitepaper() File "reproduce_vix.py", line 436, in whitepaper sigma2 = each_period_vol(contrib, mid_strike, forward) File "reproduce_vix.py", line 331, in each_period_vol sigma2['Forward'] = forward File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2299, in setitem self._set_item(key, value) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2366, in _set_item value = self._sanitize_column(key, value) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2515, in _sanitize_column value = reindexer(value).T File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2495, in reindexer raise e

khrapovs commented 8 years ago

@mwahal Thanks for raising all these issues! Although I have plenty of data to test this code on, I never did it yet. Please, see the last two commits. They should address your two latest points. Please, let me know if you find anything else.

mwahal commented 8 years ago

I have modified the code a bit to accept arguments for a few things. Also, to take care of SAME cpdiff, I preprocess the options file and adjust one of the call ask price by 1c. But I will look at your changes. I am attaching my file after incorporating your changes. By default, it will run as your configuration.

  1. option data file
  2. yield file
  3. number of days (still the default is 30).

On a side note, I am calculating VIX (30day) and VXV(90day) from the SPX options in real time (as close as possible). From my observations, the VIX calculated is usually higher by 0.5% higher but the VIX90 is usually lower than the actual index values.

reproduce_vix.py.txt

mwahal commented 8 years ago

The latest code still doesn't fix the issue when the near term or next term series expiration days is same as the calculation period. So, if an option expires 30 days later and one of the series is also expiring 30 days, the code throws an exception.

python3 reproduce_vix.py Traceback (most recent call last): File "reproduce_vix.py", line 457, in vixvalue = whitepaper() File "reproduce_vix.py", line 447, in whitepaper two_sigmas = interpolate_vol(sigma2) File "reproduce_vix.py", line 368, in interpolate_vol two_sigmas = grouped.apply(near_next_term).groupby(level='Date').first() File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 640, in apply return self._python_apply_general(f) File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 644, in _python_apply_general self.axis) File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 1524, in apply res = f(group) File "/usr/local/lib/python3.4/dist-packages/pandas/core/groupby.py", line 636, in f return func(g, _args, *_kwargs) File "reproduce_vix.py", line 349, in near_next_term T2 = days[days > T1].min() File "/usr/local/lib/python3.4/dist-packages/numpy/core/_methods.py", line 29, in _amin return umr_minimum(a, axis, None, out, keepdims) ValueError: zero-size array to reduction operation minimum which has no identity

mwahal commented 8 years ago

I had made a mistake in making changes in my script when picking up your new code. The script doesnt crash anymore when the days of expiry is same as 30. BUT, the vix calculated is wrong when days to expire is SAME as 30. So, I have removed that code for now.

Also, I have made an effort to port your script to C for speed. Since I am calling it every minute on several option chains and also need to back test on a few months of 1 min option chains, the speed is of utmost importance.

mwahal commented 8 years ago

Found one more issue when excluding the option chains with zero bids AND stopping the chain when two consecutive zero bids are found.

Here is an example from debugging your code in function remove_crazy_quotes(). The last column is a count of zeros you keep. When you see the first zero bid at 215.50, you increment you zero bid count to 1. But at 216.0, the bid is non zero, yet you dont reset the zero bid count. At 216.50, you increment zero bid count to 2 and that's where you end the option chain. But as per Vix white paper, you only stop the option chain when you get TWO CONSECUTIVE zero bids. That doesn't happen till 227.50, making the 222.50 as last of the strike to be included. See the reference to white paper at the bottom

       8     C   213.0  0.20  0.21         209         0               0
       8     C   213.5  0.14  0.15         209         0               0
       8     C   214.0  0.10  0.11         209         0               0
       8     C   214.5  0.08  0.09         209         0               0
       8     C   215.0  0.06  0.07         209         0               0
       **8     C   215.5  0.00  0.00         209         1               1**
       8     C   216.0  0.04  0.05         209         0               1
       **8     C   216.5  0.00  0.00         209         1               2**
       8     C   217.0  0.02  0.03         209         0               2
       8     C   217.5  0.02  0.03         209         0               2
       8     C   218.0  0.02  0.03         209         0               2
       8     C   219.0  0.02  0.03         209         0               2
       8     C   220.0  0.01  0.02         209         0               2
       8     C   222.5  0.01  0.02         209         0               2
       8     C   225.0  0.00  0.01         209         1               3
       8     C   227.5  0.00  0.01         209         1               4

From the whitepaper - page 7.

Next, select out-of-the-money call options with strike prices > K0. Start with the call strike immediately higher than K0 and move to successively higher strike prices, excluding call options that have a bid price of zero. As with the puts, once two consecutive call options are found to have zero bid prices, no calls with higher strikes are considered. (Note that the 2225 call option is not included despite having a non-zero bid price.)