matplotlib / mplfinance

Financial Markets Data Visualization using Matplotlib
https://pypi.org/project/mplfinance/
Other
3.61k stars 624 forks source link

Marking / Highlighting After Hours #365

Open Jellayy opened 3 years ago

Jellayy commented 3 years ago

I'm working on styling for my charts and I'm wondering if there's a good way to highlight / grey out / mark after hours trading periods in some way like some graphing programs do.

Here's an example of one of my charts at the moment: no markers

Here's a quick photoshop markup of what I want to do: with markers

I'm currently pulling from an api that gives data on a multitude of periods and intervals. I saw there was functionality for highlighting between timestamps with the fill_between feature. However, I'm a bit stumped on how to make sure I cover all after hours periods in any given period. Any pointers in the right direction on doing this properly would be greatly appreciated!

manuelwithbmw commented 3 years ago

I think you could leverage the axvspan (vertical span -rectangle- across the axes) function of Matplotlib: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvspan.html with something like the below:

fig,axlist = mpf.plot(data,...,returnfig=True) axlist[0].axvspan(...) mpf.show()

DanielGoldfarb commented 3 years ago

@Jellayy Austin, Unfortunately the fill_between in mplfinance is somewhat lame. It's just something I coded up very quickly without too much thought, so its limited to a single fill_between specification on a single panel. I was planning this week to enhance it to accept a list of fill_between's, however other projects need work right now so it may be another 2 to 4 works for the better fille_between.

In the meantime Manuel's suggestion using axvspan() may work. You may also try mplfinance's vlines kwarg (see cell In [8]: of this "using lines" notebook).

By the way, that's a sharp looking style your are developing there. Perhaps when you are satisfied with it, you can give it a name and submit a PR to add it to mplfinance styles. The latest version of mplfinance has a method, write_style_file(), that allows you to take your custom style and write it to a file that can be used for the Pull Request:

s = mpf.make_mpf_style(...)
mpf.write_style_file(s,'mystylename.py')
Jellayy commented 3 years ago

Thanks for the suggestions! I know this is coming a little late, but I'd like to drop the currently very hacked together solution I'm using here for anyone who stumbles upon this.

A solution around the vlines that Daniel suggested ended up working well enough, I have not yet tried something based on axvspan as suggested by Manuel.

There is probably a better way to do this, but this block here gives a list of all market opening and closing points within a timeframe. (Currently only accounts for weekends, no market holidays, etc.)

delta = end - start
weekend = {5, 6}
splits = []

for i in range(delta.days + 1):
    if (start + timedelta(days=i)).weekday() not in weekend:
        splits.append((start + timedelta(days=i)).strftime("%Y-%m-%d" + " 09:30"))
        splits.append((start + timedelta(days=i)).strftime("%Y-%m-%d" + " 16:00"))

Then I simply add this list as the parameter for vlines when I plot my graph along with any other arguments I would have for styling, etc.

mpf.plot(df, vlines=dict(vlines=splits))

And, with my current styling, here is the result: graph

Still probably needs some work, but this works fine for my purposes for now and I hope it helps anyone else looking for something similar.

And to reply to @DanielGoldfarb 's second point, I keep finding myself tweaking my style bit by bit as I work on it. Once I'm not tweaking it so much and it's in a good spot I'll definitely send a pull request with it, thanks!

DanielGoldfarb commented 3 years ago

@Jellayy

Austin,

Interestingly, when you posted this, I was in the middle researching something very similar: the ability to generate time series datetimes of various frequencies, with and without weekends and/or holidays. (I am working on an enhancement to allow mplfinance users to better control the location and labeling of the time-axis tick marks, as well as the ability to extrapolate trend lines beyond the given data.)

The code you have written is fine, and I don't know if other techniques are any better, but I will post here two alternatives and people can decide for themselves. I am going to document here more than just what is relevant to your use-case, simply as a way of solidifying in my mind what I've learned this past week.

Pandas Time Series Functionality

Pandas provides the ability to generate a pandas.DatetimeIndex index object with a specified frequency. The frequency may be intraday (such once per minute, every 15 minutes, per hour, etc.) or the frequency may be daily (every day), business days only, weekly, monthly, etc. There is even a method to generate "business hours", that is, hourly data for a series of dates, but only between specified hours (such as 9am to 5pm). This latter functionality is limited however in that it does not combine with other frequencies. For example, as far as I can tell it skips weekends, but it cannot skip holidays and it only ever generates hourly datetimes. That said, there is a simple workaround for this limitation (see below).

The basic pandas API for generating a DatetimeIndex of a specified frequency is either date_range() or bdate_range(). We will use bdate_range(). It appears to me that the only difference between these API's is that bdate_range() will accept business day calendars and custom holiday calendars; otherwise they are essentially the same. Here are some examples:


# Generate a time stamp every 15 minutes for the first 13 days of January:
# (notice that the generation stops as soon as Jan 13 is reached)
pd.bdate_range('1/1/2021','1/13/2021',freq='15min')

DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 00:15:00',
               '2021-01-01 00:30:00', '2021-01-01 00:45:00',
               '2021-01-01 01:00:00', '2021-01-01 01:15:00',
               '2021-01-01 01:30:00', '2021-01-01 01:45:00',
               '2021-01-01 02:00:00', '2021-01-01 02:15:00',
               ...
               '2021-01-12 21:45:00', '2021-01-12 22:00:00',
               '2021-01-12 22:15:00', '2021-01-12 22:30:00',
               '2021-01-12 22:45:00', '2021-01-12 23:00:00',
               '2021-01-12 23:15:00', '2021-01-12 23:30:00',
               '2021-01-12 23:45:00', '2021-01-13 00:00:00'],
              dtype='datetime64[ns]', length=1153, freq='15T')

# Generate business days (exclude weekends) for the first 13 days of January:
pd.bdate_range('1/1/2021','1/13/2021',freq='B')

DatetimeIndex(['2021-01-01', '2021-01-04', '2021-01-05', '2021-01-06',
               '2021-01-07', '2021-01-08', '2021-01-11', '2021-01-12',
               '2021-01-13'],
              dtype='datetime64[ns]', freq='B')

# Generate business days (exclude weekends *and* holidays) for the first 13 days of January:
# (notice Jan 1 is now excluded)
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
pd.bdate_range('1/1/2021','1/13/2021',freq=bday_us)

DatetimeIndex(['2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07',
               '2021-01-08', '2021-01-11', '2021-01-12', '2021-01-13'],
              dtype='datetime64[ns]', freq='C')

# In much of the middle east, business days are Sun-Thu (not Mon-Fri) ...
bday_me = pd.offsets.CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu')
pd.bdate_range('1/1/2021','1/13/2021',freq=bday_me)

DatetimeIndex(['2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06',
               '2021-01-07', '2021-01-10', '2021-01-11', '2021-01-12',
               '2021-01-13'],
              dtype='datetime64[ns]', freq='C')

# Generate Business HOURS for the first 13 days of January
bhours = pd.offsets.BusinessHour(start='09:30',end='16:00')
pd.bdate_range('1/1/2021','1/13/2021',freq=bhours)
DatetimeIndex(['2021-01-01 09:30:00', '2021-01-01 10:30:00',
               '2021-01-01 11:30:00', '2021-01-01 12:30:00',
               '2021-01-01 13:30:00', '2021-01-01 14:30:00',
               '2021-01-01 15:30:00', '2021-01-04 10:00:00',
               '2021-01-04 11:00:00', '2021-01-04 12:00:00',
               '2021-01-04 13:00:00', '2021-01-04 14:00:00',
               '2021-01-04 15:00:00', '2021-01-05 09:30:00',
               ...
               '2021-01-08 15:00:00', '2021-01-11 09:30:00',
               '2021-01-11 10:30:00', '2021-01-11 11:30:00',
               '2021-01-11 12:30:00', '2021-01-11 13:30:00',
               '2021-01-11 14:30:00', '2021-01-11 15:30:00',
               '2021-01-12 10:00:00', '2021-01-12 11:00:00',
               '2021-01-12 12:00:00', '2021-01-12 13:00:00',
               '2021-01-12 14:00:00', '2021-01-12 15:00:00'],
              dtype='datetime64[ns]', freq='BH')

Unfortunately, the pandas date range generators do not allow combining frequencies. So, for example, I cannot directly generate a DatetimeIndex with a timestamp every 15 minutes, but only from 9:30am till 4pm, and excluding weekends and holidays. The simple workaround for this is:

  1. First generate a "daily" DatetimeIndex (which, if desired, excludes weekends and/or holidays)
  2. Then loop through the generated dates generating a 9:30-16:00 DatetimeIndex for each date (at a desired intraday frequency)
  3. Finally union (join together) all of these DatetimeIndexes into a single DatetimeIndex
    
    bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
    dtindex = pd.bdate_range('1/1/2021','1/13/2021',freq=bday_us)
    ixlist  = []
    for dt in dtindex:
    d1 = dt.replace(hour=9,minute=30)
    d2 = dt.replace(hour=16,minute=1) # make sure to include 16:00
    # Here we must use `date_range()` instead of `bdate_range()`
    ixlist.append(pd.date_range(d1,d2,freq='30min'))
    trading_index = ixlist[0].union_many(ixlist[1:])
    print(trading_index[range(16)])
    print('...')
    print(trading_index[range(63,77)])
    print('...')
    print(trading_index[range(-16,0)])

DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00', '2021-01-04 10:30:00', '2021-01-04 11:00:00', '2021-01-04 11:30:00', '2021-01-04 12:00:00', '2021-01-04 12:30:00', '2021-01-04 13:00:00', '2021-01-04 13:30:00', '2021-01-04 14:00:00', '2021-01-04 14:30:00', '2021-01-04 15:00:00', '2021-01-04 15:30:00', '2021-01-04 16:00:00', '2021-01-05 09:30:00', '2021-01-05 10:00:00'], dtype='datetime64[ns]', freq=None) ... DatetimeIndex(['2021-01-08 13:00:00', '2021-01-08 13:30:00', '2021-01-08 14:00:00', '2021-01-08 14:30:00', '2021-01-08 15:00:00', '2021-01-08 15:30:00', '2021-01-08 16:00:00', '2021-01-11 09:30:00', '2021-01-11 10:00:00', '2021-01-11 10:30:00', '2021-01-11 11:00:00', '2021-01-11 11:30:00', '2021-01-11 12:00:00', '2021-01-11 12:30:00'], dtype='datetime64[ns]', freq=None) ... DatetimeIndex(['2021-01-12 15:30:00', '2021-01-12 16:00:00', '2021-01-13 09:30:00', '2021-01-13 10:00:00', '2021-01-13 10:30:00', '2021-01-13 11:00:00', '2021-01-13 11:30:00', '2021-01-13 12:00:00', '2021-01-13 12:30:00', '2021-01-13 13:00:00', '2021-01-13 13:30:00', '2021-01-13 14:00:00', '2021-01-13 14:30:00', '2021-01-13 15:00:00', '2021-01-13 15:30:00', '2021-01-13 16:00:00'], dtype='datetime64[ns]', freq=None)

---
Now, in your particular use case, you only want two timestamps per day (9:30 and 16:00),
so instead of calling `pd.date_range()` we simply create those two dates:
```python
bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
dtindex = pd.bdate_range('1/1/2021','1/13/2021',freq=bday_us)
octimes = [] # open and close times
for dt in dtindex:
    octimes.append(dt.replace(hour=9,minute=30))
    octimes.append(dt.replace(hour=16,minute=0))
for ts in octimes:
    print(ts)

2021-01-04 09:30:00
2021-01-04 16:00:00
2021-01-05 09:30:00
2021-01-05 16:00:00
2021-01-06 09:30:00
2021-01-06 16:00:00
2021-01-07 09:30:00
2021-01-07 16:00:00
2021-01-08 09:30:00
2021-01-08 16:00:00
2021-01-11 09:30:00
2021-01-11 16:00:00
2021-01-12 09:30:00
2021-01-12 16:00:00
2021-01-13 09:30:00
2021-01-13 16:00:00

As you can see, this technique, is very similar to the code here, but it does allow for holidays.


Extracting Unique Dates from your Data

To generate open and close times for existing data, it may be simpler to get the business dates from the data itself, instead of generating the dates. The code below does this.

#!/usr/bin/env python

import pandas as pd
import mplfinance as mpf

df = pd.read_csv('../data/gbpusd_yf20210401-0407.csv',index_col=0,parse_dates=True)
print('df.shape=',df.shape)
print(df.head())
print(df.tail())

# Extract unique dates from the intraday data:
dates = pd.Series([d.date() for d in df.index]).unique()

# generate a list of open and close times, from the unique dates.
# also color the opens and the closes differently.
octimes = []
colors  = []
for d in dates:
    dt = pd.Timestamp(d)
    octimes.append(dt.replace(hour=9,minute=30))
    octimes.append(dt.replace(hour=16,minute=0))
    colors.append('b')
    colors.append('r')

# Plot with vertical lines at the market open and close times:
mpf.plot(df,type='candle',vlines=dict(vlines=octimes,colors=colors),title='\nGBP/USD')

The result:

Figure_2


The data file for the above example was generated as follows:

import pandas as pd
import yfinance as yf
from datetime import datetime
sd = datetime(2021, 4, 1)
ed = datetime(2021, 4, 7)
df = yf.download(tickers='GBPUSD=X', start=sd, end=ed, interval="30m")
df.index = df.index.tz_localize(None)
df.to_csv('gbpusd_yf20210401-0407.csv')