kaizen-ai / kaizenflow

KaizenFlow is a framework for Bayesian reasoning and AI/ML stream computing
GNU General Public License v3.0
112 stars 77 forks source link

Predict intraday trading volatility #7

Open gpsaggese opened 1 year ago

gpsaggese commented 1 year ago

Specs at https://docs.google.com/document/d/1ELLDf7dg3nli6nLYMpQ9IxuTW5dYdN15nluNCZbZmD4/edit#heading=h.6a527al82waq

Assigned to @zli00185

gpsaggese commented 1 year ago

@zli00185 I've updated the specs and broken down in more specific tasks.

Let's start making some progress and then we can start collaborating more closely.

zli00185 commented 1 year ago

Just finish setting up the Docker

zli00185 commented 1 year ago

Jupiter notebook die when converting the dataset to pandas DataFrame. The kernel appears to have died. It will restart automatically.

gpsaggese commented 1 year ago

Many times things need to be done in different ways than the "simplest" way to get around memory / performance issues. E.g., you can process in chunks in pandas, use Dask to split the data properly, use our ML framework to represent the computation.

To make progress, you can either check in the notebook and point me to where it's crashing or we can talk about it in the office hours.

zli00185 commented 1 year ago

Progress report: Unknown This is the plot for volatility vs timestamp for USDT AVAX and BNB from 2021 January. It took me some times to figure out how the timestamp works in this dataset. And this is how I calculate the volatility (std of close to close returns): rets = time['close'].pct_change() volatility = rets.rolling(60).std() /I use 60 for window value since it asks for 1 minute returns and the data were recorded per second./

Question:

  1. When I plot an intraday plot from a certain USDT for one year period, it always shows an extreme peak somewhere in the middle. But when I plot the the same USDT in a shorter period, that peak disappear. I wonder what could cause that? Unknown-2 Unknown-3

  2. The beginning of the plot shows an uncommon peak sometimes as well. I wonder whether this is caused by calculating volatility or is caused by plot generation. Unknown-3

Please tell me if I got any part incorrect, Thank you!

zli00185 commented 1 year ago

Progress report: Unknown-6 Unknown-7 Both plots are the volatility of each USDT from 01/01/2022 to 01/02/2022. The upper one were calculated from close to close return and lower one was generated using Garman Klass model. This is the code from calculating Garman Klass model volatility: time['part1'] = 0.5*(np.log(time['high']/time['low']))2 time['part2'] = (2np.log(2)-1)(np.log(time['close']/time['open']))2 N = 60 time['GK_Volatility'] = np.sqrt((time['part1'].rolling(N).sum() - time['part2'].rolling(N).sum())/N)

The overall trend of the plot generated by Garman Klass model is similar to the one generated from close to close return, with an averagely slightly smaller value. Here is a single plot of volatility vs timestamp for USDT_BUSD generated by two different methods: Unknown-9 Unknown-8

We plan to move on to task 4 later this week. Please tell us if we got any part incorrect, Thank you!

gpsaggese commented 1 year ago

Interesting. It seems that the Garman-Klass does a good job, although one should compare it to other methods too.

DanilYachmenev commented 1 year ago

probably obsolete, moving to P1 for now