has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
4.07k stars 225 forks source link

geom_smooth with more than 999 rows #522

Open benbogart opened 3 years ago

benbogart commented 3 years ago

It looks like geom_smooth only supports up to 999 rows.

You can see below that when I increase the row count from 999 to 1000 I get a regression line rather than a smooth plot.

Screen Shot 2021-07-28 at 5 47 50 PM

Screen Shot 2021-07-28 at 5 47 38 PM

I tried to specify method = 'gam' but I get the following error which seems to indicate that gam is not available.

PlotnineError: "Method should be one of ['lm', 'ols', 'wls', 'rlm', 'glm', 'gls', 'lowess', 'loess', 'mavg', 'gpr']"

Is adding gam to geom_smooth for smoothing more than 1000 lines on the roadmap? Is there another method to accomplish this?

has2k1 commented 3 years ago

There is no "gam" method. By default smoothing uses "loess" (assuming you have scikit-misc installed) for fewer than 1000 points; the loess algorithm does not scale well. But if you really want it then you can use it with method="loess".

On Thu., Jul. 29, 2021, 1:01 a.m. Ben Bogart, @.***> wrote:

It looks like geom_smooth only supports up to 999 rows.

You can see below that when I increase the row count from 999 to 1000 I get a regression line rather than a smooth plot.

[image: Screen Shot 2021-07-28 at 5 47 50 PM] https://user-images.githubusercontent.com/29614010/127400619-c8b0c8dc-bc07-4f29-bb96-779a1a910fe7.png

[image: Screen Shot 2021-07-28 at 5 47 38 PM] https://user-images.githubusercontent.com/29614010/127400748-297edea0-16eb-4642-9c36-a267f330255e.png

I tried to specify method = 'gam' but I get the following error which seems to indicate that gam is not available.

PlotnineError: "Method should be one of ['lm', 'ols', 'wls', 'rlm', 'glm', 'gls', 'lowess', 'loess', 'mavg', 'gpr']"

Is adding gam to geom_smooth for smoothing more than 1000 lines on the roadmap? Is there another method to accomplish this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/has2k1/plotnine/issues/522, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF6QNIVQME6YRGI5OOUB4TT2B43XANCNFSM5BFGZIZA .

benbogart commented 3 years ago

@has2k1 What do you mean by there is no "gam" method? I understand that gam is not implemented in plotnine but it is part of the tidyvrse ggplot2.

From the docs

Smoothing method (function) to use, accepts either NULL or a character vector, e.g. "lm", "glm", "gam", "loess" or a function, e.g. MASS::rlm or mgcv::gam, stats::lm, or stats::loess. "auto" is also accepted for backwards compatibility. It is equivalent to NULL.

Is there a reason why gam is not available in plotnine? There are several python implementations including from statsmodels.

I only have 40k rows and tidyverse ggplot2 handles this with ease. In plotnine setting the method to "loess" kills my kernel.

has2k1 commented 3 years ago

It is not implemented in Plotnine. If there is an implementation in statsmodels then we can add it. If I recall right, gam in statsmodels was still in development only branch when I added the smoothing methods.

On Thu., Jul. 29, 2021, 11:48 p.m. Ben Bogart, @.***> wrote:

@has2k1 https://github.com/has2k1 What do you mean by there is no "gam" method? I understand that gam is not implemented in plotnine but it is part of the tidyvrse ggplot2.

From the docs https://ggplot2.tidyverse.org/reference/geom_smooth.html

Smoothing method (function) to use, accepts either NULL or a character vector, e.g. "lm", "glm", "gam", "loess" or a function, e.g. MASS::rlm or mgcv::gam, stats::lm, or stats::loess. "auto" is also accepted for backwards compatibility. It is equivalent to NULL.

Is there a reason why gam is not available in plotnine? There are several python implementations including from statsmodels.

I only have 40k rows and tidyverse ggplot2 handles this with ease. In plotnine setting the method to "loess" kills my kernel.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/has2k1/plotnine/issues/522#issuecomment-889444770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF6QNKZJB2YA6JUPQGUBXTT2G5AFANCNFSM5BFGZIZA .

benbogart commented 3 years ago

That makes sense.

Gam does appear to be implemented in statsmodels now: https://www.statsmodels.org/v0.12.2/gam.html

It would be great to have.