hudson-and-thames / mlfinlab

MlFinLab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools.
Other
3.99k stars 1.15k forks source link

Garman-Klass Volatility Estimator Returns Empty Series for Valid OHLC Data in mlfinlab 2.3.0 #539

Open shrikantad opened 10 months ago

shrikantad commented 10 months ago

Description When using the garman_klass function from mlfinlab version 2.3.0 on a dataset with 31 OHLC entries, I expected to receive a non-empty series with volatility estimates (exactly 1 value to be precise). Instead, the function returned an empty series and issued a RuntimeWarning related to an invalid value encountered in a square root operation. This unexpected behavior suggests that there may be a bug in the function's handling of the input data or within the computation itself.

To Reproduce

  1. Install mlfinlab via pip (pip install mlfinlab==2.3.0).
  2. Load the OHLC data from the attached CSV file.
  3. Execute the garman_klass function with the DataFrame and a window size of 30.
from mlfinlab.features.volatility_estimators import garman_klass
import pandas as pd

ohlc = pd.read_csv("data/ohlc_data.csv")  # Replace with the actual path to the CSV
garman_klass(ohlc, window=30)

Expected behavior The garman_klass function should compute and return a Pandas Series with at least one volatility estimate based on the provided OHLC data.

Actual Behavior The function returns an empty Pandas Series and raises the following warning:

/home/shrk/micromamba/envs/qc/lib/python3.9/site-packages/pandas/core/arraylike.py:396: RuntimeWarning: invalid value encountered in sqrt
  result = getattr(ufunc, method)(*inputs, **kwargs)
Series([], dtype: float64)

Environment Operating System: Windows 11 (Version 23H2, OS Build 22631.3085) Python Version: 3.9.18 mlfinlab Version: 2.3.0 Pandas Version: 2.0.0

Attachments ohlc_data.csv

ohlc_data.csv (attached) containing the dataset used when encountering the issue. I have obtained this data from Quantconnect (Basic S&P500 ETF TradeBar data for 31 days in 2016)

Additional context The attached CSV file contains the OHLC data that replicates the issue. The dataset includes 31 rows of OHLCV data, which should be sufficient for the garman_klass function to calculate at least one value based on the window size of 30.

sorensenj50 commented 7 months ago

It might be a problem with the data. I was testing my own implementation of the GK estimator on a futures dataset including the EuroBond, and I got nan results because the H / L term was smaller than the C / C term, resulting in negative values which were nan when passed through the square root. The issue also came up in the NK, Z, and G contracts. If you look at the formula, there is nothing stopping it from breaking if the H / L spread is small enough.

Implement the formula yourself and test to see if this is the problem.