matplotlib / mplfinance

Financial Markets Data Visualization using Matplotlib
https://pypi.org/project/mplfinance/
Other
3.72k stars 635 forks source link

I always get line plot no matter what plot type i choose, am i missing something or it's a bug? #307

Closed ddtuan99 closed 3 years ago

ddtuan99 commented 3 years ago

Describe the bug On Jupyter Notebook, I try to plot a stock dataframe which has DatetimeIndex as index and 4 column Open, High, Low, Close. But no matter what type i choose, i always get a line plot. I've noticed that type appear as keyword of Python's builtin funtion (which return type of an object) in the notebook editor.

Screenshots and To Reproduce image image

Expected behavior Get a candlestick plot

ddtuan99 commented 3 years ago

I actually get canddlestick but my dataframe has alot of records. The graph appears like a line plot. Is there a way to make candlestick more visible? Really appreciate your time!

DanielGoldfarb commented 3 years ago

As you noticed, you actually do have candlesticks, but you have so many of them, in a small space, that they are difficult to see. This is a limitation that you will encounter with every graphics or plotting package, but there are a few things you can do, depending on what your goals are.

First, understand you are plotting (what looks to me to be) about 20 years worth of daily candlesticks, which means that you are trying to view over 5000 candlesticks on a chart maybe 20 centimeters wide. That means each candlestick is at best 0.04 millimeters, which is less than the resolution limit of the human eye. That's assuming your monitor has an even higher resolution, which is likely not the case. Consider my own monitor, the resolution is 1920x1080, less than a half pixel per candlestick assuming my plot is full screen. There is no way I would be able to distinguish that many individual candlesticks at that resolution.

So what can you do? Here are a few ideas:

  1. First you can try making the figure larger (with the figscale= kwarg), and plot the figure on a high resolution monitor or print it with a high resolution printer. Given the amount of data you are trying to plot, it is unlikely that this will help much.
  2. Plot less data, or zoom in on some specific period of time that is of interest to you. Do you really need to see 20 years worth of data? Or would 3 to 6 months of daily data be adequate for what you are trying to accomplish.
  3. Alternatively if you do really want to examine a 20 years of data, then will it be the case that daily OHLC values are significant in the context of 20 years of data? Probably not. When viewing such a long time period, it is usually the case that daily fluctuations are not so relavant. Consider resampling your data so that each candle represents the Open, High, Low, and Close for a a single month, or even a single quarter. For 20 years of data, quarterly candlesticks still means some 80 candles on the screen to eyeball. That's a lot. And if monthly then that's 240 candlesticks, which is do-able, but still a lot to look at and make sense of with your eye.

As an example, consider the 9.25 years of Intel prices in the examples folder of this repository, and the following code to plot it as is (daily) and resampled as weekly, monthly, and quarterly data:

import pandas as pd
importa mplfinance as mpf
df = pd.read_csv('data/yahoofinance-INTC-19950101-20040412.csv',index_col=0,parse_dates=True)
print(df.shape)
aggregation = {'Open'  :'first',
               'High'  :'max',
               'Low'   :'min',
               'Close' :'last',
               'Volume':'sum'}
dfw = df.resample('1W').agg(aggregation)
dfm = df.resample('1M').agg(aggregation)
dfq = df.resample('1Q').agg(aggregation)
kwargs=dict(volume=True,type='candle',tight_layout=True)
mpf.plot(df,**kwargs,title='\nINTC Daily   ')
mpf.plot(dfw,**kwargs,title='\nINTC Weekly     ')
mpf.plot(dfm,**kwargs,title='\nINTC Monthly        ')
mpf.plot(dfq,**kwargs,title='\nINTC Quarterly         ')

The results are as follows:

(This is all exactly the same data, only resampled at various frequencies. Personally I think for the almost 10 years of data, the monthly resample looks best. However I'm guessing that for 20 years of data, I would find a quarterly resample easiest to understand from a visual analysis perspective.)


image


image


image


image

ddtuan99 commented 3 years ago

Thank you so much for this and for your time mantaining this package. Your respone is really helpful to me. I've been familiar Pandas and data processing for a while but haven't learnt resampling concept. You gave me alot of useful information, even code and screenshot. Thank you very much. I wish you have a happy, healthy and productive year in 2021.

DanielGoldfarb commented 3 years ago

@ddtuan99 Thank you. Same to you and yours ... a happy, healthy and good new year.