matplotlib / mplfinance

Financial Markets Data Visualization using Matplotlib
https://pypi.org/project/mplfinance/
Other
3.69k stars 634 forks source link

Drawing on chart #136

Closed Jacks349 closed 4 years ago

Jacks349 commented 4 years ago

Hello! I recently shifted from the old mpl_finance to this library and so far it's awesome!

I don't know if this is a question or a feature request, so i'll try not to be vague. I created a simple candlestick chart, using a dataframe called df: mpf.plot(df, type='candle', figratio=(20,9), style="nightclouds", volume=True)

So far so good. Now i have another source of data: Data2 = [[9350, '2020-05-16 10:30'], [9400, '2020-05-16 12:00']]

And what i need to do is: for each element in that array, i should get a square point, or a small line that goes for the length of the candle. So in this case, for example, there should be a small line/square dot at x=2020-05-16 10:30 and y=9350.

Here is what i tried to do this: 1) Make a dataframe out of that array and use addplot.

df2 = pd.DataFrame(Walls, columns=['Price', 'Time'])

apd = mpf.make_addplot(df2, scatter=True, markersize=1, marker='s')
mpf.plot(df, type='candle', figratio=(20,9), style="nightclouds", volume=True)

But i got the following error:

raise ValueError("x and y must be the same size")
ValueError: x and y must be the same size

I was able to do this with the old mpl_finance, but i'm struggling to do it with the new library, since i just got started to it. Is there any way to do it? Or do i have to wait new releases?

For reference, here is a question from SO that shows exactly what i want to do: https://stackoverflow.com/questions/61776352/how-to-plot-an-horizontal-line-between-two-datapoints-on-matplotlib

I hope i wasn't vague! As i said, i managed to do this with the deprecated library and it was ok, but when i saw how awesome is this new one, i absolutely wanted to move to mplfinance. Keep up the good work!

DanielGoldfarb commented 4 years ago

@Jacks349

There are a couple of possible ways to do this.


If you use make_addplot() you must pass in a sequence that is the same length as your dataframe. It's OK if most of the values are NaN (except for those that you want to plot).

Please understand that when passing in additional data via the addplot kwarg, you are only passing in the y-values. The x-values are taken from the datetime index from your dataframe.

Therefore its up to you to make sure the values 9350 and 9400 are lined up to the appropriate datetime points in the sequence that you pass into mpf.make_addplot().

You can see an example of this in cell [7] of the addplot tutorial where all but a few values in the "signal" sequence are set to NaN. You can see the plot made in cell [10] of that same notebook.


The second way to accomplish what you suggest is to use the one or more of the various "lines" kwargs (hlines, vlines, alines, tlines). Note also, for your purposes, if you prefer a square to a line, if you make the line the correct length and thickness you can approximate a square. See details in the lines tutorial here

Please let me know how these suggestions work out. And if you have any more questions. All the best. --Daniel

P.S. Please note that from time to time GitHub has problems displaying jupyter notebooks. (I'm seeing this right now). Usually when that happens it gets fixed within a few hours, but if it's still happening when you see this, you can always git clone this repository and look at the jupyter notebooks on your local machine.

Jacks349 commented 4 years ago

Hello Daniel, thank you a lot!

About the first approach with scatter lines, i can't use it because although the data of the second dataframe is in the same range of the first dataframe (so they have the same points) i can't use the datetime column of the first dataframe.

So, i tried with the lines approach, and here is what i found: i can't use hlines because the line doesn't need to be wide for the whole chart, i can't use tlines either because the line needs to be plotted with 2 values (x for time and y for price) and not one.

Then i tried with alines and i managed to do it! In case anyone else should need to do th same, here is what i did:

So, what i needed was for the line was to be as wide as the candle, so i tried this:

seq_of_points=[['2020-05-17 00:10', 9300],['2020-05-17 00:10',9300]]
mpf.plot(df, type='candle', figratio=(20,9), style=s, alines=dict(alines=seq_of_points,colors=['w']))

But this is going to draw nothing on the chart.

Then i had this idea: how about setting it one second BEFORE the candle for the start of the line, and one second AFTER the end of the candle, for the end of the line?

seq_of_points=[['2020-05-17 00:09:59',9300],['2020-05-17 00:10:01',9300]] mpf.plot(df, type='candle', figratio=(20,9), style=s, alines=dict(alines=seq_of_points,colors=['w']))

The result was quite interesting, it is not as wide as the candle, it's a little bit more, but the important thing is that it DOESN'T invade other candles, and it doesn't!

Here is a screenshot of the output: q

The only problem with this solution, is that if i add another point, it will try to connect the two lines, but i don't want that:

seq_of_points=[['2020-05-17 00:09:59',9300],['2020-05-17 00:10:01',9300],
                        ['2020-05-16 23:59:59',9320],['2020-05-17 00:00:01',9320]]
mpf.plot(df, type='candle', figratio=(20,9), style=s, alines=dict(alines=seq_of_points,colors=['w']))

q

It should be like this instead: q

This happens because i made a single array, if i created one variable for every point, i would not have the problem, but that's not ideal since i will have a lot of datapoints to plot!

This result is ok for me, it's what i needed. I'm keeping this open for a while just in case you knew an even better way to do this. I hope this will be helpful for other people!

A side note: i already said this library is awesome and i will say it agan! I've never had this much fun playing with a Python library until i found mplfinance.

DanielGoldfarb commented 4 years ago

@Jacks349 Thank you. I'm really glad you are enjoying the library. I especially enjoy hearing when people say that they "shifted from the old mpl_finance to this library and so far it's awesome!" That's how I originally got involved with this. I was trying to use mpl_finance for my own work, and I was frustrated by how much code I had to write to do relatively simple things. We still have a way to go before mplfinance can fully replace mpl_finance, but I think we are on track.

Regarding your lines, you may try varying the price instead of the time, and then adjust the linewidths of the line so that you can see it. Here is an example:

line = dict(alines=[['2019-11-05 13:40',3077.385],['2019-11-05 13:40:00',3077.415]],
                 linewidths=18)
mpf.plot(tidf,type='candle',alines=line,figscale=0.75)

issue136_1

I did notice however that the linewidths don't scale, so if you change the size of the plot (through figscale or some other means) the you have to adjust linewidths accordingly.

I hope that helps. If you can explain to me what you are trying to accomplish from a finance or market perspective, perhaps this is a feature we can add.

Also, I very much like the custom style you are using. I would be greatful of you would please share your code for that. All the best. --Daniel

Jacks349 commented 4 years ago

Thanks for your answer again @DanielGoldfarb .

Let me go point by point:

1) From a more specific perspective: i'm getting orderbook data from this market, and from this orderbook data i take open orders that are larger than an X amount, this is why i called them 'Walls'. I want to plot those walls on my chart with a single line/dot. So for example, if a 'Wall' is spotted at price 9350, time 00:05, below the candle 00:05 (5 minutes timeframes) there must be a line below/above (depends on if the wall is a buy or sell wall) the candle, the line/dot must be as large as the candle. So in this case, the line goes at x=00:05, y=9350.

Here is how i have the data: data = [[9350, '00:05',], [9400, '00:34'], [9450, '01:32'] ...] So for each element in this list of lists, a line must be plotted below/above the right candle, i almost did this but the problem is that when i use more elements in a single list, there will be more lines and since i'm using alines, the code will try to connect the lines, something i don't want. Here is an example of the output i would like to reach (sorry for the bad quality but i had to do it quickly with Paint): q

Whereas, when i try to add more elements to the array, i get this: q

One approach i did not consider is using pyplot, but, having used it only a bunch of times, i don't know if it's usable here or if it could make me achieve this.

I hope i was clear enough, but i'm willing to explain even further if it wasn't! It's not a problem for me to stick to "legacy" functions from mpl_finance and wait until you guys completely replace the old API, if this is not doable yet.

2) Of course! It's very simple, i just used Nightclouds and edited the color of my candles. I'm going to share my whole (ugly) code script, it's very easy to run.

import mplfinance as mpf
import copy
import urllib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
from matplotlib.patches import Rectangle
import matplotlib.ticker as mticker
from mplfinance.original_flavor import candlestick_ohlc
import cfscrape
import json
import datetime
from matplotlib.dates import date2num
import matplotlib.pyplot as plt

BU = cfscrape.create_scraper()

URL = "https://api.binance.com/api/v1/klines?&symbol=BTCUSDT&interval=5m&limit=140"

ResultRaw = BU.get(URL, timeout=(10, 15)).content
Result = json.loads(ResultRaw)

for x in Result:
    TimeUnix = float(x[0]) / float(1000)
    K = datetime.datetime.fromtimestamp(TimeUnix)
    x[0] = K

df = pd.DataFrame([x[:6] for x in Result], 
                  columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume'])

format = '%Y-%m-%d %H:%M:%S'

df['Date'] = pd.to_datetime(df['Date'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Date']))
df["Open"] = pd.to_numeric(df["Open"],errors='coerce')
df["High"] = pd.to_numeric(df["High"],errors='coerce')
df["Low"] = pd.to_numeric(df["Low"],errors='coerce')
df["Close"] = pd.to_numeric(df["Close"],errors='coerce')
df["Volume"] = pd.to_numeric(df["Volume"],errors='coerce')

# Create my own `marketcolors` to use with the `nightclouds` style:
mc = mpf.make_marketcolors(up='#00ff00',down='#ff0019',inherit=True)

# Create a new style based on `nightclouds` but with my own `marketcolors`:
s  = mpf.make_mpf_style(base_mpf_style='nightclouds',marketcolors=mc)

# This is just an example, an original array of this data would be much bigger
Walls=[['2020-05-17 07:49:59',9460],['2020-05-17 07:50:01',9460],
               ['2020-05-17 07:59:59',9480],['2020-05-17 08:00:01',9480]]

mpf.plot(df, type='candle', figratio=(20,9), style=s, alines=dict(alines=Walls,colors=['w']))

3) Shifting from the old library to this new one was natural when i saw the examples and the features. Here is my experience: most of the Python data visualization libraries (except Matplotlib) on Python don't have a feature to save your chart straight from your code or, in order to do so, they require you to install other dependencies. With mplfinance it's incredibly easy, it's even easier than doing it on the old mpl_finance.

Another point of strength is, in my opinion, how easy it is to generate the chart: not only it's easier than the old library, but it's easier than every other library i've come to use until now. Three lines to create a chart is awesome. Other than that, i think i never found any library that allowed me to create Renko charts.

DanielGoldfarb commented 4 years ago

@Jacks349

Thanks! As you noticed previously, the way to prevent connecting the lines is to keep each x,y pair in a separate sequence. Below I've made each x,y pair it's own tuple (but lists work just as well).

Walls=[ ( ['2020-05-17 07:49:59',9460],['2020-05-17 07:50:01',9460] ),
        ( ['2020-05-17 07:59:59',9480],['2020-05-17 08:00:01',9480] )
      ]

If you already have your orderbook data in a single sequence, it's easy enough to write a line or two of code to convert it to pairs of points as described above.

Also, if the timestamps on your orderbook data match timestamps in your dataframe -- for example, your dataframe has data every 5 minutes, maybe 80 data points for a trading day, and the orderbook data contains maybe only 10 or 20 points but all of those orderbook points also fall on some 5 minute point or another -- then it is easy enough to use pandas to merge the orderbook data with the trading data (let me know if you need help with that). Once that is done, you can use make_addplot() to plot your "Walls" in which case you can use any symbol that you want to represent the Walls (for examples, see here: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/marker_reference.html , and of course the symbols can be any color that you want as well). HTH.

Jacks349 commented 4 years ago

Making one tuple for every wall works! Awesome! Just one more thing: is possible to do something like mpf.plot(df, type='candle', figratio=(20,9), style=s, alines=[dict(alines=Walls,colors=['w']), dict(alines=Walls2,colors=['g'])] ), so that i can give two different colors to two different sets of Walls? For example, on Walls i'll store only walls bigger than X amount and i will plot them with a white color, on Walls2 i will plot other Walls bigger than X+1 amount in red.

About your second paragraph: they will have the the same timestamps: if a 'wall' is spotted at 14:33 (for example) i will change the timestamp to 14:35 because it goes below/above the 14:35 5MIN candle, the same if i should be using a 30m timeframe. Merging the data sounds like a very interesting option, if you have some spare time (since i've already wasted a lot of yours) i would like to see it! My only concern is: how is that doable if, for a single candle (which means a single row in a dataframe) there can be more walls? For example, what if i get 6 walls for the candle 17/05/2020 16:30 at price 9350?

DanielGoldfarb commented 4 years ago

alines does not presently support a list of dict (which would be easy enough to do) however that is not necessary for what you want to do since colors can be a list and will apply a different color to each line segment. For example, suppose you have Walls1 and Walls2, and each is a list of tuples, where each tuple contains two "points", say something like this, where t is time and p is price:

Walls1 = [ ( [ t11, p11], [t12, p12] ), ( [ t3, p3], [t4, p4] ), ( [ t9, p9], [t10, p10] ) ]
Walls2 = [ ( [ t7, p7], [t8, p8] ), ( [ t1, p1], [t2, p2] ) ]

The numbers in my example are arbitrary to emphasize that it makes absolutely no difference what order the points are in ... each tuple represents two points between which a line will be drawn. If you want to color them differently just do this:

w1colors = ['w']*len(Walls1)
w2colors = ['g']*len(Walls2)
walls = Walls1 + Walls2
colors = w1colors + w2colors

The end result of the above will be:

walls = [ ( [ t11, p11], [t12, p12] ), ( [ t3, p3], [t4, p4] ), ( [ t9, p9], [t10, p10] ), ( [ t7, p7] [t8, p8] ), ( [ t1, p1], [t2, p2] ) ]
colors = ['w','w','w','g','g']

So you can then simply do:

mpf.plot(data, ..., alines=dict(alines=walls,colors=colors))

Now regarding merging walls data with the original dataframe, there are several ways to handle, in the merge and dataframe, having multiple "walls" at the same time and price. The way you choose will depend in part on what you plan to do with that data.

How do you plan to handle multiple walls all at the same price for a single candle? Would you plot anything differently to indicate that there is more than one wall there? If so, how would you want the plot to look? Would you do anything else differently based on the information that there is more than one? Would you limit the number of walls at the same price and time that you handle, or could it be 1 or 6 or 100 or 10,000?

Jacks349 commented 4 years ago

Ok! I did everything you suggested and now i have what i need! I'm amazed at how powerful and versatile everything is here! Thank you really a lot @DanielGoldfarb !

The approach that i'm taking here should make sure that there will be something between 10-20 walls per candle. If there are two walls at the same price there are two things i can do: add a simple loop to my code that will make sure that when there are more walls at the same price, i will "merge" them in one, that should reduce the number of data. Another approach would be to simply plot them, they should just override each other on the plot, which is not an issue for me, but i'll probably go with the first approach.

DanielGoldfarb commented 4 years ago

If there are two walls at the same price there are two things i can do: add a simple loop to my code that will make sure that when there are more walls at the same price, i will "merge" them in one, that should reduce the number of data. Another approach would be to simply plot them, they should just override each other on the plot, which is not an issue for me, but i'll probably go with the first approach.

This sounds like to me that you want the plot will look exactly the same whether you have 1 wall or 10 walls or 20 walls at a specific time/price. So someone looking at the plot will only know where there are walls, but they will not whether there are multiple walls at a given location (time/price). Is that correct?

Jacks349 commented 4 years ago

Yes, i know it's not the best solution but i'm afraid i will make it too complicated in other ways

DanielGoldfarb commented 4 years ago

That's fine. That also greatly simplifies how to do the merge. I'll come back and explain it a little later when I have some more time.

In the meantime, sounds like you have what to play with. One thing I will mention, which you probably know already, since you want to essentially treat duplicate walls as one, you can use Python set to quickly compress duplicates from your list of walls:

walls_without_dups = list(set(walls))

All the best.

Jacks349 commented 4 years ago

Your help has been immense, thank you really a lot for taking your time to help me! Absolutely hyped for the future of this library.

Jacks349 commented 4 years ago

A little update on this: i managed to do it and the result is this, i think it's beautiful :)

n

manuelwithbmw commented 4 years ago

:) really a beautiful chart, pretty cryptic though, will you explain the walls colouring?

Jacks349 commented 4 years ago

@manuelwithbmw Of course! This is a Bitcoin chart, in crypto trading (i don't know about other markets) there are a lot of visual representations of orderbook on the chart, so each line corresponds to large orderbook levels (big orders), i think the name for these kind of charts is heatmaps.

manuelwithbmw commented 4 years ago

Wow, I checked what that is (below link), was not familiar, so it refers to market depth at his finest. It reminds me a little bit of Market Profile which is easier to get by the way, more intuitive. I do not trade crypto (I am old style ha) but thanks for explaining Jack @Jacks349

https://medium.com/@bookmap/heatmap-in-trading-how-to-learn-what-market-depth-is-hiding-7095cf191d03

Jacks349 commented 4 years ago

@manuelwithbmw You're welcome! Yes, that is a clear example of heatmap! My heatmap is much more simple, in cryptos there is less liquidity and levels are clearer usually!

fxhuhn commented 4 years ago

@Jacks349, @DanielGoldfarb

Thanks! As you noticed previously, the way to prevent connecting the lines is to keep each x,y pair in a separate sequence.

The support is really great here. I have a similar problem with the connected lines and haven't had time to describe it in detail yet, there is quite a solution.

DAX_bei

Jacks349 commented 4 years ago

@fxhuhn this is a really nice chart! Looks way better than mine, i really like the scatter dots!

DanielGoldfarb commented 4 years ago

@fxhuhn, @Jacks349

Both charts are very nice! It's great to see people being creative with this package.

Markus, very nice chart! The solution to connected lines is to make each line a len==2 tuple of pairs, and then create a list of those for many lines, as described here

All the best.

fxhuhn commented 4 years ago

Hi @DanielGoldfarb

Markus, very nice chart! The solution to connected lines is to make each line a len==2 tuple of pairs, and then create a list of those for many lines

i tried this for a zigzag line: lines_dict = stock['lines'].dropna().to_dict() and add it to mpf-plot via alines=lines_dict

But i get this error: TypeError: kwarg "alines" validator returned False for value: "{Timestamp('2019-10-28 00:00:00'): 48.18, [...]'Validator' : lambda value: _alines_validator(value) }, Is Timestamp not allowed?

DanielGoldfarb commented 4 years ago

@fxhuhn Markus,

Timestamp should work fine. As you can see here the validator is ok with it.

I think the problem is that you are passing in the points (line vertices) as a dict. The points themselves must be a sequence. The dict is used only if you want to pass in other kwargs in addition to the points (for example colors, linestyle, etc).

fxhuhn commented 4 years ago

I made it but not very elegant :-(

df_lines = pd.DataFrame(stock['lines'].dropna())
lines = list(df_lines.itertuples(index=True, name=None))

ICE_DAX_ADS So now it's time to fine tune it. Style, Color etc.

DanielGoldfarb commented 4 years ago

@fxhuhn

but not very elegant :-(

I think your idea to use itertuples() is very elegant.

fxhuhn commented 4 years ago

I think your idea to use itertuples() is very elegant. yes, but not parsing through an additional dataframe. I found no direct way without loosing the index.

Jacks349 commented 4 years ago

@DanielGoldfarb i just saw about the new updates, a lot of new awesome features but what caught my attention is fill_between(). Do you think it can be used to get the same output i'm getting with alines? I'm 100% satisfied with what i'm using now, but i want to keep an eye on new features i can use to enrich the chart!

DanielGoldfarb commented 4 years ago

@Jacks349

Do you think it can be used to get the same output i'm getting with alines?

I don't understand the question. alines allows you to draw arbitrary lines. fill_between allows you to fill color/shading between values (i.e. "lines").

Jacks349 commented 4 years ago

More of a general question! I was just wondering if it was possible to fill a small area below/above a candle using x and y values, just like i did with alines

DanielGoldfarb commented 4 years ago

It is possible, but it is a little tricky. See the very bottom of this notebook where it is demonstrated that fill_between fills between y-values, but (by default) for all x-values.

In order to fill_between at only certain x-values, you have to pass a boolean sequence to say where (for which x-values) the plot should be filled between the y-values.

For such a small patch, I think it's easier to use a short line with a large width (so as to appear as a rectangle) as was explained here.

DanielGoldfarb commented 4 years ago

Is there any reason to keep this issue open? It seems to me that the original request has been satisfied (or am I missing something?). Thanks.

Jacks349 commented 4 years ago

At first it was supposed to be open for anyone who wanted help to draw on their charts, but i think it can be closed now! Original request is satisfied!