pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.58k forks source link

ENH: Add ohlc / candlestick plotting option for DataFrame #783

Closed wesm closed 7 years ago

wesm commented 12 years ago

Not sure if possible to do in a generic way. But maybe will make sense once we have an OHLC aggregator implemented

esthers commented 11 years ago

Yes please implement candlestick plotting for DataFrame. It is so nice to make instant plots of financial data from yahoo with pandas but candelstick charts for many, like me, really are a must to analyze the data.

wesm commented 11 years ago

I'd be happy to accept a pull request for this

dalejung commented 11 years ago

I have this here: http://nbviewer.ipython.org/4982660/

In order to support financial data I send mpl an ordinal index, essential np.arange(len(df.index)) and draw the ticks with a Locator/Formatter. Doing this with DatetimeIndex actually speeds up plotting https://github.com/pydata/pandas/issues/1579 and gets rid of weekend gaps. However, a side effect of this is that you have to keep the previous data around for future plots on the same axes. For that I have a Figure class that handles translating any future plots into an ordinal index. I chose to keep all plotted data around for plot specific fillnas, but that's probably unnecessary.

I'd like to merge this stuff in but I'm not sure how compatible it is. trtools.charting is fairly heavy handed in assuming you're not touching matplotlib directly. Mixing ordinal and datetime index plots on the same axes wouldn't work.

The module itself is self-contained outside of a small column_grep utility, so it can be made to be cleanly import-able as an add-on.

esthers commented 11 years ago

@dalejung, if you could find a way to merge your code it would be greatly appreciated.

woodb commented 10 years ago

@dalejung, have you had any luck implementing this?

I agree that importing a big library for something like a candlestick plot seems heavy handed, maybe there's a lighter weight way we can get this going.

woodb commented 10 years ago

@wesm, how do you think this should work syntactically?

Rolling this functionality into plot might be neat.

For example, if the DataFrame has a datetime index and columns "open", "high", "low", "close", and a DataFrame.plot() is called, would it make sense for the result to be a candlestick plot by default?

Would this be better as a kwarg for DataFrame.plot() or as a separate function (e.g. ohlc_plot)

dalejung commented 10 years ago

@woodb Sort of. I split out the charting work into its own project https://github.com/dalejung/ts-charting. The heaviness was due to me keeping the plotted data around in a DataFrame. I was doing this to use the DataFrame.__setitem__'s auto-reindexing to keep the plots aligned. I'm a bit smarter now and keep only index around.

I think ts-charting should be straight-forward to merge into pandas, at least a portion of it. When I get the time I'll open a PR.

woodb commented 10 years ago

@dalejung Prima, let me know if you need any help, I've got some time here and there.

dalejung commented 10 years ago

@woodb Actually, a smaller first pass could be done with http://nbviewer.ipython.org/5864433. That uses the matplotlib datetime handling.

ts-charting converts the DatetimeIndex to ints and translates the labels back with a custom Formatter. I'm pretty sure this could be made pandas compatible by always checking the current ax for our TimestampFormatter and reindexing to the current x-axis. I suppose the Figure and Grapher classes would need their methods refactored into flat functions or hidden away onto the Formatter.

Converting to an int-index is better but it means that every plot to that ax will need the translation. It's convenient to call fig.hl_span("2012-01-02", "2012-01-05"), but the base matplotlib method would not work ax.axvspan("2012-01-02", "2012-01-05") unless I monkey patched it.

Actually, thinking about it more. I dunno if that's acceptable for inclusion into pandas. Part of the reason for the Figure abstraction is to create a namespace where all methods are aware of the Datetime -> int translation. hm.

Go Gators??!

phobson commented 10 years ago

Thought y'all might be interesting in this: https://github.com/matplotlib/matplotlib/pull/2643

If merged, you'll be able to feed a matplotlib axes object a list of dictionaries "stats" describing the boxplots via a bxp method

ghost commented 10 years ago

Thanks @phobson. Worth noting that mpl already has candlestick support pretty much baked in, But it's planned for removal in a future release?

phobson commented 10 years ago

@y-p -- yeah. What I've heard is that matplotlib.finance will be deprecated in mpl 1.4 and removed by 1.5.

ghost commented 10 years ago

@phobson , is it getting a new home or will the code just be thrown away? The fragmented state of python viz has been coming up in issues here lately. The situation is eerily similar to wes's old post on the fragmented state of data libraries in python, hoping for a similar turn for the better.

jreback commented 10 years ago

@phobson new champion for a combined viz library? (built on pandas of course) :)

jtratner commented 10 years ago

I was discussing this out of band with @olgabot. @mwaskom 's seaborn could be a good candidate (and it already appears to use pandas).

phobson commented 10 years ago

@y-p I can't say for certain what the final fate of it is. My impression, however, is that it'll be thrown out.

Personally, I think pandas/pydata people/we should be equally supporting/endorsing/contributing to seaborn and python-ggplot.

I got in early on matplotlib straight from using matlab during my stint in academia -- which is to say I'm comfortable using it directly and I probably won't ever get my head around ggplot-esque APIs. However, I think having both styles of API is crucial to growing the python-science/data/viz community.

ghost commented 10 years ago

poignant comment made a few days ago re two valid approaches: mpl recipes vs. ggplot. I don't know which library will come out on top (hopefully both), but we should encourage viz PRs to join forces with existing efforts rather then pandas sapping work away IMO.

Note: @phobson just beat me to it :)

jtratner commented 10 years ago

@phobson - yeah, definitely think both are good efforts. I find ggplot really confusing personally (but that's probably because I started with mpl).

Not totally clear where that leaves us in terms of the visualization PRs we currently have. If we were to change the plot functions to lookup kind with a dictionary, it'd be relatively easy for others to hook into the pandas plot method (not sure if that's actually desirable though)

olgabot commented 10 years ago

So should I move heatmap to seaborn (or at least suggest it with a PR) to keep with this theme?


Olga Botvinnik PhD Program in Bioinformatics and Systems Biology Gene Yeo Laboratory | Sanford Consortium for Regenerative Medicine University of California, San Diego olgabotvinnik.com blog.olgabotvinnik.com github.com/olgabot

On Mon, Dec 16, 2013 at 9:56 AM, y-p notifications@github.com wrote:

poignant comment made a few days ago re two approaches: mpl recipes vs. ggplot. I don't know which library will come out on top, but we should encourage viz PRs to join forces with existing forces rather then pandas sapping effort away IMO.

— Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/783#issuecomment-30683347 .

ghost commented 10 years ago

You might, I've been thinking of doing the same thing except targeting ggplot.

wesm commented 7 years ago

Won't fix

femtotrader commented 7 years ago

Some basic code to display daily OHLCV data

import pandas as pd
from pandas.compat import StringIO

import matplotlib.pyplot as plt
from matplotlib.finance import candlestick_ohlc
import matplotlib.dates as mdates

def plot_candlestick(df, ax=None, fmt="%Y-%m-%d"):
    if ax is None:
        fig, ax = plt.subplots()
    idx_name = df.index.name
    dat = df.reset_index()[[idx_name, "Open", "High", "Low", "Close"]]
    dat[df.index.name] = dat[df.index.name].map(mdates.date2num)
    ax.xaxis_date()
    ax.xaxis.set_major_formatter(mdates.DateFormatter(fmt))
    plt.xticks(rotation=45)
    _ = candlestick_ohlc(ax, dat.values, width=.6, colorup='g', alpha =1)
    ax.set_xlabel(idx_name)
    ax.set_ylabel("OHLC")
    return ax

data="""Date,Stock,Open,High,Low,Close,Volume
2016-09-29,KESM,7.92,7.98,7.92,7.97,149400
2016-09-30,KESM,7.96,7.97,7.84,7.9,29900
2016-10-04,KESM,7.8,7.94,7.8,7.93,99900
2016-10-05,KESM,7.93,7.95,7.89,7.93,77500
2016-10-06,KESM,7.93,7.93,7.89,7.92,130600
2016-10-07,KESM,7.91,7.94,7.91,7.92,103000"""
df = pd.read_csv(StringIO(data), index_col='Date', parse_dates=True)

ax = plot_candlestick(df)

print(ax)

plt.tight_layout()
#plt.savefig("candle.png")
plt.show()

candle

Related SO questions:

jreback commented 7 years ago

@femtotrader you are just plotting a box-plot.

http://pandas.pydata.org/pandas-docs/stable/visualization.html#box-plots

femtotrader commented 7 years ago

An example of box-plot with such a DataFrame will be nice.

I tried

df[["Open", "High", "Low", "Close"]].transpose().plot.box()

and get

boxplot

It doesn't look very well