matplotlib / mplfinance

Financial Markets Data Visualization using Matplotlib
https://pypi.org/project/mplfinance/
Other
3.61k stars 624 forks source link

running out of memory on loop image generation #483

Open Mikeardy opened 2 years ago

Mikeardy commented 2 years ago

Hi, I used the following code in a loop generation plots:

style = mpf.make_mpf_style(base_mpf_style="yahoo")
adp = [mpf.make_addplot(re["EMA10"], width=1.5, color="#4c8fb9"),
       mpf.make_addplot(re["EMA30"], width=1.5, color="#679e64"),
       mpf.make_addplot(buy2["HA-low"], type="scatter", marker="^", markersize=100, color="g"),
       mpf.make_addplot(sell2["HA-high"], type="scatter", marker="v", markersize=100, color="r"),
       ]
mpf.plot(re, type="candle", style=style,
      title=f"BOT{b} {symbol} {timeframe} {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} {Algorithm}",
      addplot=adp,
      volume=False,
      # ylabel=symbol + " - " + timeframe,
      savefig=dict(fname=filename, dpi=150, pad_inches=0.50),
      )

It runs every 15 minutes and iterates "n" times depending on how many source vectors are available, let say from 50 to 300, but the real number of plot generate are 5 to 10. It worked fine for hours but suddenly I got the following error:

Job "scannow (trigger: interval[0:15:00], next run at: 2021-12-16 02:45:05 CET)" raised an exception
Traceback (most recent call last):
  File "/home/pi/.local/lib/python3.9/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/home/pi/deve/deve-Multi.py", line 331, in scannow
    mpf.plot(re, type="candle", style=style,
  File "/home/pi/.local/lib/python3.9/site-packages/mplfinance/plotting.py", line 765, in plot
    plt.savefig(**save)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/pyplot.py", line 958, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/figure.py", line 3012, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/backend_bases.py", line 2314, in print_figure
    result = print_method(
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/backend_bases.py", line 1643, in wrapper
    return func(*args, **kwargs)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 412, in wrapper
    return func(*inner_args, **inner_kwargs)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 540, in print_png
    FigureCanvasAgg.draw(self)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 431, in draw
    self.renderer = self.get_renderer(cleared=True)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 447, in get_renderer
    self.renderer = RendererAgg(w, h, self.figure.dpi)
  File "/home/pi/.local/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py", line 93, in __init__
    self._renderer = _RendererAgg(int(width), int(height), dpi)
MemoryError: In RendererAgg: Out of memory

should I release the memory closing the plot in such a way? How? thanks

DanielGoldfarb commented 2 years ago

This sounds similar to https://github.com/matplotlib/mplfinance/issues/386 but not quite the same. You may want to read through that to get some ideas.

The first thing is to make sure you are using matplotlib backend "Agg" ... If all you are doing is saving the plots to files, then the correct backend to use for that is "Agg". Based on your walkback trace, however it does seem to be using Agg, however you may want to confirm this in your code for at least one run.

It also appears that you are calling mpf.plot() from something called deve-Multi.py ... is it possible you are running multiple instances at the same time? If so, that would be important information to have.

Can you please clarify:

iterates "n" times ... let say from 50 to 300, but the real number of plot generate are 5 to 10.

Please show all of the surrounding code, and/or clarify what it means to iterate 50 to 300 times to in order to generate only 5 or 10 plot files.

Three other possibilities that you may try are:

  1. Set returnfig=True. This will give you access to the Figure object which you can delete after saving the figure to a file.
    Note well: When setting returnfig=True do not pass your savefig=dict(fname=filename, dpi=150, pad_inches=0.50) into mpf.plot() (since it will be ignored anyway); rather after calling mpf.plot() call Figure.savefig() ... something like this:

    fig, axlist = mpf.plot(re, type="candle", style=style,
         title=f"BOT{b} {symbol} {timeframe} {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} {Algorithm}",
         addplot=adp, volume=False, returnfig=True )
    
    fig.savefig=(filename, dpi=150, pad_inches=0.50)
    del fig
  2. You may want to try invoking the garbage collector at the end of each run. It should be working fine automatically, but I suppose this can't hurt.
    import gc
    gc.collect()
  3. Consider that something else you are doing in your code may be using up or leaking memory. From where are you getting your data for each run? Are you deleting where that data is stored at the end of each 15 minute run? What's going on with apscheduler both before and after you call run_job. Also what's going on with deve-Multi.py and the code around it?

Good luck. All the best. --Daniel

fxhuhn commented 2 years ago

I saw the same Issue. It appears only with returnfig=True. Your ideas with Del fig does not fix it. After around 180 charts the process stops. A bit wired is that without external activities (returnfig=close), the script generates more then 700 charts in one loop.

DanielGoldfarb commented 2 years ago

@fxhuhn Markus, I'm not sure what you described is the same.

Are you indeed seeing "MemoryError: In RendererAgg: Out of memory" or something else?

Also, as @Mikeardy described it in the original post, it does not use returnfig=True.

Please confirm whether you are seeing the exact same memory error message (with RendererAgg). Thank you.

fxhuhn commented 2 years ago

@DanielGoldfarb I'm on "something else". I don't get a message at all, it just stops

This snippet works as expected:

mpf.plot(df[['Open', 'High','Low','Close','Volume']],
     type='candle',
     #returnfig=True,
     savefig=save,
     closefig=True
     )

And this one stops without forcing any exception error:

fig, _ = mpf.plot(df[['Open', 'High','Low','Close','Volume']],
     type='candle',
     returnfig=True,
     #savefig=save,
     #closefig=True
     )
fig.savefig(save['fname'])
del fig
DanielGoldfarb commented 2 years ago

@fxhuhn

I'm on "something else". I don't get a message at all, it just stops

I will try to reproduce on my end. What OS are you running? Python version? mplfinance version? matplotlib version?

DanielGoldfarb commented 2 years ago

As a separate question, I am trying to understand why anyone would save hundreds of plots all at once. Please, anyone who can, explain to me what you are trying to accomplish, so that I can better understand how mplfinance is being used.

In my own approach, if I am developing and testing a trading strategy, my approach is something like this: I will make several plots, maybe 10 or 20 at most, to try to refine my ideas. When I get to a point where I want to try out hundreds or thousands of scenarios, I will loop through all the scenarios but I won't make plots of them. Instead I will calculate the profit or loss, and then sort them. Then after all that I may make another 10 or 20 or 30 plots to examine the best and the worst scenarios, particularly the worst, so that I can visualize and perhaps understand why the strategy did not work as well in those scenarios. Then I use that information to refine my strategy. Ultimately however I take my signals from a running the strategy without plotting. Only if I get a signal to I plot that particular case so that I can visually examine it and decide if I agree that its a good signal.

That said, I am at a loss to understand why someone would make hundreds of plots in one loop. Can someone please explain what you are doing with them. Much appreciated! All the best. --Daniel

fxhuhn commented 2 years ago

Hi @DanielGoldfarb , Absolutely right! At the first moment, it was simply faster to realize. In the meantime, I filter the data beforehand, now there are only a few charts.

Eg: This table shows some stocks with a lower RSI and a hight volume, and the charts were build for them if needed. Long_status ticker name Long_active Long_TP1 Long_Range rsi_7
  enph_us Enphase Energy       1
  ebay_us eBay       1
  tlry_us Tilray       2
  sant_de S&T       2
  shop_us Shopify A       2
  plug_us Plug Power       2
  pvh_us PVH       2
  snap_us SNAP       3
  tsla_us Tesla       3
  twtr_us Twitter       2
  tcom_us Trip.com Group Ltd Sp ADR       2
  luv_us Southwest Airlines Co       2
  lulu_us Lululemon Athletica       2
  mdt_us Medtronic       1
  oxy_us Occidental Petroleum       3
  alb_us Albemarle       2
-1 R pvh_us PVH 2021-12-07   2021-12-13 3
1 R pton_us Peloton Interactive 2021-12-07 2021-12-08 2021-12-09 3
2 R sfq_de SAF Holland 2021-12-07 2021-12-08 2021-12-10 3
2 R pdd_us Pinduoduo 2021-12-15 2021-12-07   2

george2seven commented 2 years ago

Hello,

I can also attest that I do see the same issue in my implementation.

  1. Both .plot(....save....) and .savefig() methods while in a loop allocate large amounts of memory without releasing it back to the OS after each iteration.
  2. gc.collect() has no effect
  3. "Agg" backend utilization has no effect
  4. At least in my implementation, I have confirmed that this issue is not caused by other parts of my code.
  5. I am working on a Jupyter Notebook environment and as such, the solution of splitting the operation into smaller chunks does not work. The reason is that for the allocated memory to be released a Kernel restart has to occur.
  6. @DanielGoldfarb Daniel, I agree with you that the generation and save of multiple plots is an edge case. However, I cannot reveal the nature of my implementation at this moment. I can only assure you that at least in my case there is a valid reason for this type of use case.

My workaround currently is to execute the operation into batches with Kernel restarts in between. Although not ideal it will serve for now. The reason I posted the comment was only to provide more information towards possible resolutions.

Thanks

DanielGoldfarb commented 2 years ago

@george2seven George, Thanks for contributing to the discussion. I continue at this time to be unable to reproduce this issue. However based on your comments

.plot(....save....) and .savefig() methods while in a loop allocate large amounts of memory without releasing it back to the OS after each iteration

I am inclined to think this is a matplotlib issue, or per the fact that you are working in Jupyter Notebook, perhaps this is a Jupyter issue. (Admittedly, I now realize I never tried to reproduce inside Jupyter).

Please, if you can, check to see if you can reproduce the same issue

Even if you don't get it to crash, a simple matplotlib demo using "Agg" backend that shows that it does not free memory would be very useful, especially if not freed after calling del fig and gc.collect().

Thanks. Much appreciated.

george2seven commented 2 years ago

Hello Daniel,

I've run my script outside of Jupyter notebook:

  1. mpf.plot (...save...) works significantly better (speed-wise) compared to inside the notebook and without any memory issues (no gc.collect, yes Agg)
  2. fig.savefig has the same issue as in the notebook even with gc.collect (Agg yes, del fig yes))

WIth regards your 2nd bullet (",.simple matplotlib....") the situation is the same as in the "scripe/outside notebook" scenario above.

As such I am inclined to support that Jupyter notebook kernel DOES play a role in this issue (it seems to contribute to both the memory issue and a "slow down" behavior), but it also seems that the .savefig method has some issue when used in this intensive loop scenario for multiple charts generation.

Best Regards

AureliusMarcusHu commented 1 year ago

Hi Daniel,

I have the same issue for a long time, in my previous project and now in the new project with mplfinance. I have search for a long time for a solution, but not found anyone. In my opinion it is a matplotlib issue, they don't clean-up memory properly. I'm working on a Windows 10 pro 64 bit 8 core machine with 32 bit python 3.9. I don't save any plot when I'm looping. Before i start building a new plot I run every time the following function:

def refr_wind():
    global free, fig, root, cwin
    if fig is not None:
        fgl = plt.get_fignums()
        if len(fgl) > 1:
            plt.close(fgl[0])
            gc.collect()
        #del fig
        fig.clf()
        plt.clf()
        fig = None
        gc.collect()
        #fig = None
    for widget in cwin.winfo_children():
        widget.destroy()
    free = True
    root.title('Empty Chart')
    gc.collect()
    return

As you can see I try to clean up everything after each plot. I try different ways. But always the same. Any effect at all. After each plot the used RAM memory raise with 10 to 20 MB depending how much periods (records) you using to plot. All the above commands have any effect. As mentioned in other comments all without errors or warnings. I have search a lot on the internet for a solution, but never get a clear solution not even from the developers of matplotlib. This problem is existing for years and years always with looping and matplotlib. They running code in C in the background which the garbage collector cannot reach, It has any effect at all.

Best regards

zzxjl1 commented 6 months ago

Hi Daniel,

I have the same issue for a long time, in my previous project and now in the new project with mplfinance. I have search for a long time for a solution, but not found anyone. In my opinion it is a matplotlib issue, they don't clean-up memory properly. I'm working on a Windows 10 pro 64 bit 8 core machine with 32 bit python 3.9. I don't save any plot when I'm looping. Before i start building a new plot I run every time the following function:

def refr_wind():
    global free, fig, root, cwin
    if fig is not None:
        fgl = plt.get_fignums()
        if len(fgl) > 1:
            plt.close(fgl[0])
            gc.collect()
        #del fig
        fig.clf()
        plt.clf()
        fig = None
        gc.collect()
        #fig = None
    for widget in cwin.winfo_children():
        widget.destroy()
    free = True
    root.title('Empty Chart')
    gc.collect()
    return

As you can see I try to clean up everything after each plot. I try different ways. But always the same. Any effect at all. After each plot the used RAM memory raise with 10 to 20 MB depending how much periods (records) you using to plot. All the above commands have any effect. As mentioned in other comments all without errors or warnings. I have search a lot on the internet for a solution, but never get a clear solution not even from the developers of matplotlib. This problem is existing for years and years always with looping and matplotlib. They running code in C in the background which the garbage collector cannot reach, It has any effect at all.

Best regards

same issue

DanielGoldfarb commented 6 months ago

This is indeed a matplotlib issue, which I doubt will ever be changed. The issue is related to the "matplotlib backend" being used. When saving plots into files only, then the appropriate backend to use is "Agg".

Although there seem to be some comments to the contrary above, the following lines of code should fix the problem:

import matplotlib
matplotlib.use("Agg") 

Make sure the above lines of code are run before any other code that uses matplotlib or mplfinance.

This is the proper solution, if you do not need/want the GUI windows, then do not even create the GUI windows!

You may also set and export the env variable MPLBACKEND=agg before running any matplotlib code.