Closed sunilshah closed 1 year ago
How would you re-phrase the first sentence? Do the examples not make it clear?
Hi TomAugspurger,
Are you asking about re-phrasing the title or the first sentence of the Problem Description in my post?
Which examples are you referring to?
The first sentence & examples in the docstring: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.Series.pct_change.html
On Thu, Apr 19, 2018 at 9:06 PM, sunilshah notifications@github.com wrote:
Hi TomAugspurger,
Are you asking about re-phrasing the title or the first sentence of the Problem Description in my post?
Which examples are you referring to?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/20752#issuecomment-382943291, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIt3woWkn7RnlSOmmMPcZbkk89FOIks5tqUKjgaJpZM4Tcu9g .
I would change "Percent Change" to "Fractional Change" for every occurrence.
The examples do imply fractional changes. When I looked at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pct_change.html
there were no examples.
I haven't heard the term "Fractional Change" before, is it standard?
The docs I linked to are the developer docs, with a recently improved docstring. The ones you saw were (understandably) for the currently released version.
Are the improved docs sufficiently clear?
On Thu, Apr 19, 2018 at 9:15 PM, sunilshah notifications@github.com wrote:
I would change "Percent Change" to "Fractional Change" for every occurrence.
The examples do imply fractional changes. When I looked at https://pandas.pydata.org/pandas-docs/stable/generated/ pandas.DataFrame.pct_change.html
there were no examples.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/20752#issuecomment-382945266, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIkHqnUs8TbQ4_uBDqpF9wwpi5cCHks5tqUS-gaJpZM4Tcu9g .
The dictionary meaning of fractional (adjective) is precisely that: pertaining to or being a fraction.
http://www.dictionary.com/browse/fractional https://www.merriam-webster.com/dictionary/fractional
It could also be called relative change. Percent change implies 100 times the fractional or relative change. Therefore the name and the description are misleading.
The improved docs are better than the original because of the examples. The user still has to go through the numbers in the examples to compute and figure out that the author really meant fractions and not percentage, even though it mentions percent eight times. IMHO it should be corrected.
this is the first time anyone has brought this up, even though this method has existed since the beginning of pandas. this is splitting hairs about the semantic meaning. This is a won't fix.
The only reason to consider a change (to the docs), is to help new users learn faster. That was my motivation to spend my time filing the request.
Thanks. I think we're OK with the current docs for now. We have this issue as a record in case someone else is confused, and we can change it then.
On Fri, Apr 20, 2018 at 10:27 AM, sunilshah notifications@github.com wrote:
The only reason to consider a change (to the docs), is to help new users a faster ramp in using pandas. That was my motivation to spend my time filing the request.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/20752#issuecomment-383132730, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIjb3wxCRbOUbjX9YcjpyKkOYkI3Mks5tqf5pgaJpZM4Tcu9g .
For what it's worth, I was calculating percent differences in a class today and had a student use this function. They neglected to multiply by 100 because they didn't check the docs (and I was surprised myself at the behavior after reading them). I used it as a good teachable moment for why you should always check the docs, but I don't think that's a particularly good reason for keeping it that way. I volunteer to make the change if you're open to it, but I have to agree with sunilshah here that the standard definition of percent change is out of 100.
Yup, the description should specify that it gives fractional output instead of the expected percentage change output.
We just came across this too, when a student asked why the values weren't percents. I can't say why people have not complained, but it's very confusing to see that the method is called pct_change
and there is "percentage change" through the docstring, when in fact it is fractional change. In particular you have:
Percentage change between the current and a prior element.
Computes the percentage change from the immediately previous row by
default. This is useful in comparing the percentage of change in a time
series of elements.
This is just incorrect. It should read:
Fractional change between the current and a prior element.
*Be careful* - despite the name of this method, it calculates fractional
change and not percentage change.
Computes the fractional change from the immediately previous row by
default. This is useful in comparing the fraction of change in a time
series of elements.
If you need the percentage change, multiply these values by 100.
All the other instances of "percent" should be "fraction".
I agree that this is extremely confusing. At least the docs should be changed. The current Doc says thus, which is very much misleading. Percentage is scaled to 100.
`Percentage change between the current and a prior element.
Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.`
I also find this highly misleading as this function simply does not involve any percentage. I end up not using it as it makes my code less understandable. Another name which may be more common and intuitive than fractional change would be relative change.
@charron - of course I completely agree, but I'm afraid we didn't get anywhere when I tried bringing this up on the list - starting here. The argument against doing anything about this appears to be, that when some presumably large proportion of Pandas users see a 'pct_change' of 1.0, from the values [1, 2], they instinctively think of this as being a proportion (rather than a percent), and they regard the actual number - 1.0 - as representing 100%, just not displayed as such. I think. But you're probably best reading it yourself, I doubt I'm doing the argument justice.
@matthew-brett thank you for linking the extended discussion in the mailing-list, I was not aware of it. Maybe this is a matter of background as one poster mentioned. I am only surprised that this issue did not get more traction, which unfortunately lends support to the argument against doing anything.
It seems obvious to me that there's at least a significant proportion of users who find this behavior somewhere between surprising and clearly incorrect. As someone who uses and teaches Pandas a lot, has contributed some commits and the original wheel build machinery, and wants it to succeed, it's sad for me that this one doesn't have a better solution than "It's fine as it is".
Maybe this should go into a separate issue, but I think in addition to the name of pct_change
, the formula on my interpretation is not actually correct for negative values:
In [2]: pd.Series([-2, -1]).pct_change()
Out[2]:
0 NaN
1 -0.5
dtype: float64
I would expect a change from -2 to -1 to be a 50% increase, not a decrease. But perhaps there is a scenario where the negative value is what is desired?
This gives a per unit change. It is not a per cent change. (cent is latin for 100) Even if one doesn't want to fix the badly named function, the doc string should not say "Percentage change" for it does not actually produce a per cent change. The output is actually a relative change expressed as a per unit change. The relative change is not expressed as a percentage change. The docs and function name would both be correct if they called it relative change. They would likewise be correct if they called it more specifically a per unit change. They would not be correct if they call it a percentage change. https://en.wikipedia.org/wiki/Relative_change_and_difference#Percentage_change
Hi @thorr18 - I agree that the docstring could do with being clarified - are you interested in submitting a pull request? If so, here's the contributing guide
Hi @buhrmann -0.5 is correct, according to the definition ( (-1 - (-2)) / -2 = - 0.5) . If you disagree, please open a new issue
Yes, it should be indicated in the doc that it is a relative rate and not a percent. It is confusing.
sure - interested in submitting a pull request to clarify the docs?
opening as a doc change
This is clearly an error in the document or function naming. Why are you still trying to argue? Please fix this bug quickly and stop misleading beginners!
@beyondguo thanks for your interest and constructive criticism - are you interested in submitting a PR to clarify the docs? https://pandas.pydata.org/docs/dev/development/contributing_docstring.html
I think this comment has some good suggestions for how to clarify the docs
take
It's certainly a good idea to update the docstring, but it is a little difficult to do it. It seems to me the docstring would have to start with something like:
Although the method is named `pct_change`, in fact, it returns proportional change, and not percent change.
To get percent change, multiply the returned value by 100.
I do care about this, and I'm happy to write the docstring - but would such a change really be acceptable?
looks like @gustavovargas has already taken this
I think all that needs adding is a note, just like there's a note here https://pandas.pydata.org/docs/reference/api/pandas.Series.searchsorted.html
Well - that note - for those who did not follow the link - is "The Series must be monotonically sorted, otherwise wrong locations will likely be returned.". That's a perfectly reasonable caveat in usage. Here we have the problem of explaining why the method name strongly implies it is returning something other than its actual result. But - just to ask again - is something like my suggestion acceptable?
yeah looks fine
Code Sample, a copy-pastable example if possible
Problem description
This is fraction change and not percent change.
Dictionary definition of percent is "One part in every hundred."
Either change the documentation to state that it is fractional change and not percent change with the formula (for backward compatibility) or rename it (to get rid of the obvious error).
Expected Output
Output of
pd.show_versions()