Open gpsaggese opened 1 year ago
@zli00185 I've updated the specs and broken down in more specific tasks.
Let's start making some progress and then we can start collaborating more closely.
Just finish setting up the Docker
Jupiter notebook die when converting the dataset to pandas DataFrame. The kernel appears to have died. It will restart automatically.
Many times things need to be done in different ways than the "simplest" way to get around memory / performance issues. E.g., you can process in chunks in pandas, use Dask to split the data properly, use our ML framework to represent the computation.
To make progress, you can either check in the notebook and point me to where it's crashing or we can talk about it in the office hours.
Progress report: This is the plot for volatility vs timestamp for USDT AVAX and BNB from 2021 January. It took me some times to figure out how the timestamp works in this dataset. And this is how I calculate the volatility (std of close to close returns): rets = time['close'].pct_change() volatility = rets.rolling(60).std() /I use 60 for window value since it asks for 1 minute returns and the data were recorded per second./
Question:
When I plot an intraday plot from a certain USDT for one year period, it always shows an extreme peak somewhere in the middle. But when I plot the the same USDT in a shorter period, that peak disappear. I wonder what could cause that?
The beginning of the plot shows an uncommon peak sometimes as well. I wonder whether this is caused by calculating volatility or is caused by plot generation.
Please tell me if I got any part incorrect, Thank you!
Progress report: Both plots are the volatility of each USDT from 01/01/2022 to 01/02/2022. The upper one were calculated from close to close return and lower one was generated using Garman Klass model. This is the code from calculating Garman Klass model volatility: time['part1'] = 0.5*(np.log(time['high']/time['low']))2 time['part2'] = (2np.log(2)-1)(np.log(time['close']/time['open']))2 N = 60 time['GK_Volatility'] = np.sqrt((time['part1'].rolling(N).sum() - time['part2'].rolling(N).sum())/N)
The overall trend of the plot generated by Garman Klass model is similar to the one generated from close to close return, with an averagely slightly smaller value. Here is a single plot of volatility vs timestamp for USDT_BUSD generated by two different methods:
We plan to move on to task 4 later this week. Please tell us if we got any part incorrect, Thank you!
Interesting. It seems that the Garman-Klass does a good job, although one should compare it to other methods too.
probably obsolete, moving to P1 for now
Specs at https://docs.google.com/document/d/1ELLDf7dg3nli6nLYMpQ9IxuTW5dYdN15nluNCZbZmD4/edit#heading=h.6a527al82waq
Assigned to @zli00185