joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
220 stars 71 forks source link

Check for constant `ylim` susceptible to numerical precision issues #368

Closed bollard closed 2 years ago

bollard commented 2 years ago

Hi,

I'm attempting to use plot.xts to plot a series which contains constant values, however I encounter the error

Error in segments(xlim[1], y_grid_lines(get_ylim()[[2]]), xlim[2], y_grid_lines(get_ylim()[[2]]), : cannot mix zero-length and non-zero-length coordinates

After a little Googling, I was puzzled to see that this issue was fixed in https://github.com/joshuaulrich/xts/issues/156

With the help of debug I found that in my case, numerical precision issues were causing my data to side step the check that had been implemented. The difference can only be seen with sprintf. Debug console output:

Debug> yrange <- range(cs$Env$xdata[, 1][subset], na.rm = TRUE)
Debug> dput(yrange)
c(6e+05, 6e+05)
Debug> yrange[1L] == yrange[2L]
[1] FALSE
Debug> sprintf("%.6f", yrange[[1]])
[1] "600000.000000"
Debug> sprintf("%.9f", yrange[[1]])
[1] "600000.000000000"
Debug> sprintf("%.12f", yrange[[1]])
[1] "599999.999999999767"
Debug> sprintf("%.12f", yrange[[2]])
[1] "600000.000000000116"
Debug> isTRUE(all.equal(yrange[1], yrange[2]))
[1] TRUE

Proposal is to replace the strict == check, with all.equal, which includes a tolerance of sqrt(.Machine$double.eps)

Thanks

Please review the contributing guide before submitting your pull request. Please pay special attention to the pull request and commit message sections. Thanks for your contribution and interest in the project!

bollard commented 2 years ago

I've just noticed similar logic to deal with constant ylim is also found here and here so I've updated the PR to extract a helper function which deals with tweaking the values.

Hope that all seems sensible but I'm a little less clear on the logic in those latter two sections so please feel free to update as you see fit. Thanks

joshuaulrich commented 2 years ago

Thanks for the report and patch! Do you have an example that triggers this issue? I'd like to add a unit test to make sure it doesn't reoccur. I applied your patch and I still get an error when I run this:

plot(xts::xts(rep(10,10), Sys.Date()-10:1))
##  Error in segments(xlim[1], y_grid_lines(get_ylim()[[2]]), xlim[2], y_grid_lines(get_ylim()[[2]]),  : 
##    cannot mix zero-length and non-zero-length coordinates
bollard commented 2 years ago

My use case is a little more complicated as I'm using both multi.panel = 5 and yaxis.same = FALSE, however just having a look now I can run your example code without any issues with my patches applied (just copy and past-ing the code in locally with trace):

> trace(xts::plot.xts, edit = TRUE)
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Tracing function "plot.xts" in package "xts"
[1] "plot.xts"
> plot(xts::xts(rep(10,10), Sys.Date()-10:1))

Can I just check you've applied all the patches (my original plus the two follow ups)? I'll see if I can figure out how to squash them down into one to make it clearer

EDIT: Just squashed and force pushed. Hope that's OK

joshuaulrich commented 2 years ago

EDIT: Just squashed and force pushed. Hope that's OK

Yep, that's fine! In the future, you don't need to do this unless I ask. I'm good with Git and can usually get what I need without users having to do extra work.

Can I just check you've applied all the patches (my original plus the two follow ups)? I'll see if I can figure out how to squash them down into one to make it clearer

Here's the diff I have. Did you make change to any other file(s)?

diff --git a/R/plot.R b/R/plot.R
index 05bc23d..dc77c0e 100644
--- a/R/plot.R
+++ b/R/plot.R
@@ -290,6 +290,18 @@ plot.xts <- function(x,
   cs$Env$main <- main
   cs$Env$ylab <- if (hasArg("ylab")) eval.parent(plot.call$ylab) else ""

+  # guard against constant yrange by (if necessary) perturbing values
+  .perturbConstant <- function(yrange) {
+    if(isTRUE(all.equal(yrange[1L], yrange[2L]))) {
+      if(yrange[1L] == 0) {
+        yrange <- yrange + c(-1, 1)
+      } else {
+        yrange <- c(0.8, 1.2) * yrange[1L]
+      }
+    }
+    return(yrange)
+  }
+  
   # chart_Series uses fixed=FALSE and add_* uses fixed=TRUE, not sure why or
   # which is best.
   if(is.null(ylim)){
@@ -305,14 +317,8 @@ plot.xts <- function(x,
       # set the ylim based on all the data if this is not a multi.panel plot
       yrange <- range(cs$Env$xdata[subset], na.rm=TRUE)
     }
-    if(yrange[1L] == yrange[2L]) {
-      if(yrange[1L] == 0) {
-        yrange <- yrange + c(-1, 1)
-      } else {
-        yrange <- c(0.8, 1.2) * yrange[1L]
-      }
-    }
-    cs$set_ylim(list(structure(yrange, fixed=FALSE)))
+    
+    cs$set_ylim(list(structure(.perturbConstant(yrange), fixed=FALSE)))
     cs$Env$constant_ylim <- range(cs$Env$xdata[subset], na.rm=TRUE)
   } else {
     # use the ylim arg passed in
@@ -420,7 +426,7 @@ plot.xts <- function(x,
     if(yaxis.same){
       lenv$ylim <- cs$Env$constant_ylim
     } else {
-      lenv$ylim <- range(cs$Env$xdata[subset,1], na.rm=TRUE)
+      lenv$ylim <- .perturbConstant(range(cs$Env$xdata[subset,1], na.rm=TRUE))
     }

     exp <- quote(chart.lines(xdata,
@@ -452,9 +458,7 @@ plot.xts <- function(x,
         if(yaxis.same){
           lenv$ylim <- cs$Env$constant_ylim
         } else {
-          yrange <- range(cs$Env$xdata[subset,i], na.rm=TRUE)
-          if(all(yrange == 0)) yrange <- yrange + c(-1,1)
-          lenv$ylim <- yrange
+          lenv$ylim <- .perturbConstant(range(cs$Env$xdata[subset,i], na.rm=TRUE))
         }
         lenv$type <- cs$Env$type
bollard commented 2 years ago

So I think the source of the confusion is that I was working on the latest released version, (0.12.1), whereas I assume you were running on master. Once I checked out master, I once again encountered the error (seeing the same as you saw).

After a decent amount of head scratching and trying to pick through the code (dusting it with perturbConstant calls), I've come down tiny change (in replot_env$Env$y_grid_lines) which I think solves the issue for me. I'll push a change now and would be great if you could check whether this fixes things for you as well. Thanks

joshuaulrich commented 2 years ago

So I think the source of the confusion is that I was working on the latest released version, (0.12.1), whereas I assume you were running on master.

Yes, I am working from the HEAD of master. Excellent sleuthing!

After a decent amount of head scratching and trying to pick through the code...

Thank you very much for continuing to dig into this! Your latest commit does prevent the error for my simple example. But the grid lines and series aren't plotted for some reason. I don't expect you to dig into that.

Screenshot from 2022-05-30 09-26-49

Thanks again!

bollard commented 2 years ago

Ah sorry, didn't notice that (was just relieved to finally see it now throw an error!) If if helps, I did notice that in one of the calls to segments seemed to have sensible values for x0 and y0, but then x1 was a vector of 4 values and y1 was numeric(0). I couldn't see where those fishy x1 and y1 values were coming from but I did also see that this call to segments uses ylim()[2] twice, which seemed a little odd on first glance. Let me know if you make any more progress and if I get a chance I'll see if I can revisit the whole fresh but this time starting at master

joshuaulrich commented 2 years ago

I just pushed a commit that fixes the blank chart in my previous comment. Please give this a try and let me know if it's good or if there are issues.

bollard commented 2 years ago

Sure thing, so I've just pulled your patch and had a look locally. Good news is that this certainly does look a lot better on the test series you provided suggested above. However, as I alluded to above, the data I was using that originally triggered this issue is a little more complicated. So I also tested a case closer to what I'm actually doing

plot(xts::xts(cbind(rep(1, 10), rep(10, 10)), Sys.Date() - 10:1), yaxis.same = FALSE, multi.panel = 2)

(i.e., constant values, AND yaxis.same = FALSE AND multi.panel) and while it doesn't through an error, the yaxis-es do curiously seem to be of the same scale (granted, if not exactly the same value). I would have expected the top plot to use the same +/- 20% logic to set yaxis scale of 0.8 to 1.2:

test-yaxis-same

Not a dealbreaker as the patch you've got is definitely an improvement, but just thought I'd flag it.

Thanks

joshuaulrich commented 2 years ago

I appreciate you flagging that! I'll take another look.


EDIT:

The issue is that the main panel ylim are recalculated using the range of all columns. So that's why the max is 10. To fix it, we'd need a way to only use the range of all the series being plotted on a panel.

joshuaulrich commented 2 years ago

My most recent commit seems to fix the problem. Please let me know if you can do some more testing. I'll merge if all looks good!

set.seed(21) 
x <- xts::xts(cbind(rnorm(10), rep(10, 10)), Sys.Date() - 10:1)
plot(x, yaxis.same = FALSE, multi.panel = 2)

image

bollard commented 2 years ago

Yes, this looks much better. Thanks! test-yaxis-same-fix

joshuaulrich commented 2 years ago

Thanks for the PR and all the help testing! I really appreciate it!