twopirllc / pandas-ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators
https://twopirllc.github.io/pandas-ta/
MIT License
5.5k stars 1.08k forks source link

Understanding ta.vp indicator (Volume Profile). bug? #615

Closed argcast closed 1 year ago

argcast commented 2 years ago

For testing & understanding purposes I've created a DataFrame and run vp on the first 10 rows which is the minimum width required by the indicator.

I understand as on vp.py close series is evaluated and through the series.diff(1) in the signed_series function current close is compared to the next close and then assigned a positive or negative value, which then results in either pos_volume or neg_volumefor the vp itself.

As I understand, the total volume in a given price range, should be the same as the total vp volume should return. Is this correct? I have checked Issues #74 and #185 looking for an already answered similar question.

Here is the df.head(10) :

image

And here is df.ta.vp() where you can see Volume data loss on 2nd and 3rd rows:

image

As you can see, when close prices are equal on consecutive rows, issues arise as df.ta.vp()['total_Volume'].sum() == df.Volume.sum() evaluates to False

What am I missing or maybe not understangid about vp?

twopirllc commented 1 year ago

Hello @argcast,

Yeah that does look like an issue. Something I would have to find some time to dig into more when I get a chance.

I found these links for VP which may help as its actual source is hard to find.

If you have any ideas on how to fix it, that would be helpful. 😎

Apologies for the late reply.

Kind Regards KJ

argcast commented 1 year ago

Hey @twopirllc ,

sorry for the late reply. I have been working on it with a simple fix. Here is my approach:

Since the issue happens only when open and closing prices are equal, current vp.py does not know wether if it is pos_Volume or neg_Volume happening. Neither do we and that means we cannot directly sum it to either of them. That's why I modified the code a little bit adding a new neut_volume (neutral volume) that computes those cases.

Let me explain better:

import pandas as pd
import pandas_ta as ta
import yfinance as yf

df = yf.download('RVP', start="2023-03-01")
Screenshot 2023-03-24 at 11 13 23

As we can see, we have found one case where open == close and upon calling df.ta.vp()we can see we have data loss:

Screenshot 2023-03-24 at 11 13 30

While if we consider neut_vol (neutral volume) as what is happening in those cases we can avoid data loss:

Screenshot 2023-03-24 at 11 13 55

and VP charting should consider that "neutral" volume too

Screenshot 2023-03-24 at 11 14 31

PS: I'm not sure wether if "neutral" should be the correct naming for this volume I've created a PR #670 with modified vp.py so you can take a look at it and make any necessary adjustments.

I hope this helps 🙂 Albert.