Printing the first and last n observations for xts and/or zoo?

joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.

http://joshuaulrich.github.io/xts/

GNU General Public License v2.0

220 stars 71 forks source link

Printing the first and last n observations for xts and/or zoo? #321

Closed markushhh closed 1 year ago

Eluvias commented 4 years ago

If it helps, here is one approach. Of course needs testing, but it works for me so far.

library(xts)

xts_print <- function(x, n = 5) {

    if (is.null(colnames(x))) {
      nm <- paste0("X.", 1:ncol(x))
    } else {
      nm <- colnames(x)
    }

    df <- format(fortify.zoo(x), justify = "right")
    colnames(df) <- c("Index", nm)
    row.names(df) <- paste(format(rownames(df), justify = "right"),
                           ":", sep = "")

    nr <- nrow(df)

    if (nr <= n && nr <= 5) {

      print(df)

    } else {

      if (nr < n * 2) {
        n <- floor(nr / 2)
      }

      cat("\n")
      print(utils::head(df, n))

      ndigits <- nchar(nrow(df))

      if (ndigits >= 3) {
        cat(rep(" ", ndigits - 3), "---")
      } else {
        cat("---")
      }

      nm2 <- vector(mode = "numeric", ncol(x))
      for (i in 1:ncol(x)) {
        nm2[i] <- formatC(" ", width = nchar(nm[i]))
      }

      attr(df, "names") <- c("", nm2)
      print(utils::tail(df, n), right = TRUE, justify = "right")
    }
  }

data(sample_matrix)

samplexts <- as.xts(sample_matrix)

xts_print(samplexts)
#> 
#>           Index     Open     High      Low    Close
#>   1: 2007-01-02 50.03978 50.11778 49.95041 50.11778
#>   2: 2007-01-03 50.23050 50.42188 50.23050 50.39767
#>   3: 2007-01-04 50.42096 50.42096 50.26414 50.33236
#>   4: 2007-01-05 50.37347 50.37347 50.22103 50.33459
#>   5: 2007-01-06 50.24433 50.24433 50.11121 50.18112
#>  ---                                                   
#> 176: 2007-06-26 47.44300 47.61611 47.44300 47.61611
#> 177: 2007-06-27 47.62323 47.71673 47.60015 47.62769
#> 178: 2007-06-28 47.67604 47.70460 47.57241 47.60716
#> 179: 2007-06-29 47.63629 47.77563 47.61733 47.66471
#> 180: 2007-06-30 47.67468 47.94127 47.67468 47.76719

xts_print(samplexts, n = 1)
#> 
#>           Index     Open     High      Low    Close
#>   1: 2007-01-02 50.03978 50.11778 49.95041 50.11778
#>  ---                                                   
#> 180: 2007-06-30 47.67468 47.94127 47.67468 47.76719

xts_print(head(samplexts,10), n = 8)
#> 
#>          Index     Open     High      Low    Close
#>  1: 2007-01-02 50.03978 50.11778 49.95041 50.11778
#>  2: 2007-01-03 50.23050 50.42188 50.23050 50.39767
#>  3: 2007-01-04 50.42096 50.42096 50.26414 50.33236
#>  4: 2007-01-05 50.37347 50.37347 50.22103 50.33459
#>  5: 2007-01-06 50.24433 50.24433 50.11121 50.18112
#> ---                                                  
#>  6: 2007-01-07 50.13211 50.21561 49.99185 49.99185
#>  7: 2007-01-08 50.03555 50.10363 49.96971 49.98806
#>  8: 2007-01-09 49.99489 49.99489 49.80454 49.91333
#>  9: 2007-01-10 49.91228 50.13053 49.91228 49.97246
#> 10: 2007-01-11 49.88529 50.23910 49.88529 50.23910

# 2nd sample data
xm <- xts(cumsum(rnorm(100, 0, 0.2)), Sys.time() - 100:1)

xts_print(xm)
#> 
#>                    Index         X.1
#>   1: 2020-08-03 09:28:00  0.14533549
#>   2: 2020-08-03 09:28:01  0.26327216
#>   3: 2020-08-03 09:28:02  0.21394361
#>   4: 2020-08-03 09:28:03  0.20015489
#>   5: 2020-08-03 09:28:04  0.18350584
#>  ---                                    
#>  96: 2020-08-03 09:29:35 -1.74172313
#>  97: 2020-08-03 09:29:36 -1.66798390
#>  98: 2020-08-03 09:29:37 -1.47796503
#>  99: 2020-08-03 09:29:38 -1.16800551
#> 100: 2020-08-03 09:29:39 -1.18936443

markushhh commented 3 years ago

I really liked your approach. Just now, I was improving your solution for the third time, and IMO the best solution is following:

library("xts")
library("data.table")

data(sample_matrix)
samplexts <- xts::as.xts(sample_matrix)

print.xts <- function(x, ...) {
    print(data.table::as.data.table(x))
}

print(samplexts)

I couldn't write better code than the authors of data.table and data.table's printing function is incredibly fast and reliable. Hence, depending on data.table is "the best" one can do. It seems kind of really unfortunate for your and my time being wasted like this... but I appreciate your work @Eluvias ! It doesn't really work with tibbles, since the index gets dropped and tsibbles have not (yet) implemented a converter method from xts but that's another story...

joshuaulrich commented 3 years ago

The main issue I see with both of these solutions is that they make it appear like xts objects have an 'index' column, which is not true. That's likely to cause a lot of confusion.

This would also make xts inconsistent with zoo, and consistency with zoo is an objective because xts extends zoo. We need to consider differences in xts compared to zoo. I could discuss with the zoo team about adding a xts.max.print option that we could allow to be set to a one or two element vector. The two element version would allow you to specify how may head/tail observations to print. And it would allow users to set options(xts.max.print = getOption("max.print") to restore the prior behavior.

Also, with no disrespect to the data.table team, I'm not going to add a dependency on another package for a print method.

jangorecki commented 3 years ago

print(data.table::as.data.table(x))

wouldn't make much sense because it has to copy whole object during conversion of xts (matrix) to data.table. Much easier just simple concatenate print output of head and tail of xts.

ghost commented 3 years ago

On 16 Sep 2020, at 12:49, Jan Gorecki notifications@github.com wrote:

print(data.table::as.data.table(x)) wouldn't make much sense because it has to copy during conversion of xts (matrix) to data.table. Much easier just simple concatenate print output of head and tail of xts.

But without as.data.frame:

https://github.com/eddelbuettel/dang/blob/master/R/print.R https://github.com/eddelbuettel/dang/blob/master/R/print.R

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joshuaulrich/xts/issues/321#issuecomment-693327233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAHXMM53VKSTROWRYGDYJDSGCJ3HANCNFSM4KFTVH5A.

markushhh commented 3 years ago

The following code provides a solution for xts (print.xts) and zoo (print.zoo) objects. The methods do not change the general behaviour of the existing print methods. They just trim the output. The methods add the argument max with getOption("xts.max.print") and getOption("zoo.max.print"). What's your opinion on it?

library("xts")

check.TZ <- xts:::check.TZ
tformat <- xts:::tformat
coredata <- zoo::coredata

print.xts <- function(x,
                      fmt,
                      max = getOption("xts.max.print"),
                      ...) {
  check.TZ(x)
  if (missing(fmt)) {
    fmt <- tformat(x)
  }
  if (is.null(fmt)) {
    fmt <- TRUE
  }

  if (NROW(x) > max*2+1) {
    index <- as.character(index(x))
    index <- c(index[c(1:max)], "...", index[(NROW(x)-max+1):NROW(x)])
    y <- rbind(
      format(as.matrix(x[1:max, ])),
      format(matrix(rep("", NCOL(x)), nrow = 1)),
      format(as.matrix(x[(NROW(x)-max+1):NROW(x), ]))
    )
    rownames(y) <- format(index, justify = "right")
    colnames(y) <- colnames(x)
  } else {
    y <- coredata(x, fmt)
  }

  if (length(y) == 0) {
    if (!is.null(dim(x))) {
      p <- structure(vector(storage.mode(y)), dim = dim(x),
                     dimnames = list(format(index(x)), colnames(x)))
      print(p)
    } else {
      cat('Data:\n')
      print(vector(storage.mode(y)))
      cat('\n')
      cat('Index:\n')
      index <- index(x)
      if (length(index) == 0) {
        print(index)
      } else {
        print(str(index(x)))
      }
    }
  } else {
    print(y, quote = FALSE, right = TRUE, ...)
  }
}

print.zoo <- function (x,
                       style = ifelse(length(dim(x)) == 0, "horizontal", "vertical"), 
                       quote = FALSE,
                       max = getOption("zoo.max.print"),
                       ...) {

  style <- match.arg(style, c("horizontal", "vertical", "plain"))
  if (is.null(dim(x)) && length(x) == 0) {
    style <- "plain"
  }
  if (length(dim(x)) > 0 && style == "horizontal") {
    style <- "plain"
  }
  if (style == "vertical") {
    if (NROW(x) > max*2+1) {
      index <- index2char(index(x), frequency = attr(x, "frequency"))
      index <- c(index[c(1:max)], "...", index[(NROW(x)-max+1):NROW(x)])
      y <- rbind(
        format(as.matrix(x[1:max, ])),
        format(matrix(rep("", NCOL(x)), nrow = 1)),
        format(as.matrix(x[(NROW(x)-max+1):NROW(x), ]))
      )
      rownames(y) <- format(index, justify = "right")
      colnames(y) <- colnames(x)
    } else {
      y <- as.matrix(coredata(x))
      if (length(colnames(y)) < 1) {
        colnames(y) <- rep("", NCOL(y))
      }
      if (NROW(y) > 0) {
        rownames(y) <- index2char(index(x), frequency = attr(x, "frequency"))
      }
    }
    print(y, quote = quote, ...)
  } else if (style == "horizontal") {
    y <- as.vector(x)
    names(y) <- index2char(index(x), frequency = attr(x, "frequency"))
    print(y, quote = quote, ...)
  } else {
    cat("Data:\n")
    print(coredata(x), ...)
    cat("\nIndex:\n")
    print(index(x), ...)
  }
  invisible(x)
}

data("sample_matrix", package = "xts")
samplexts <- xts::as.xts(sample_matrix)
samplezoo <- zoo::as.zoo(sample_matrix)

options("xts.max.print" = 5)
options("zoo.max.print" = 5)

print.xts(samplexts)

#>                Open     High      Low    Close
#> 2007-01-02 50.03978 50.11778 49.95041 50.11778
#> 2007-01-03 50.23050 50.42188 50.23050 50.39767
#> 2007-01-04 50.42096 50.42096 50.26414 50.33236
#> 2007-01-05 50.37347 50.37347 50.22103 50.33459
#> 2007-01-06 50.24433 50.24433 50.11121 50.18112
#>        ...                                    
#> 2007-06-26 47.44300 47.61611 47.44300 47.61611
#> 2007-06-27 47.62323 47.71673 47.60015 47.62769
#> 2007-06-28 47.67604 47.70460 47.57241 47.60716
#> 2007-06-29 47.63629 47.77563 47.61733 47.66471
#> 2007-06-30 47.67468 47.94127 47.67468 47.76719

print.zoo(samplexts)

#>            Open     High     Low      Close   
#> 2007-01-02 50.03978 50.11778 49.95041 50.11778
#> 2007-01-03 50.23050 50.42188 50.23050 50.39767
#> 2007-01-04 50.42096 50.42096 50.26414 50.33236
#> 2007-01-05 50.37347 50.37347 50.22103 50.33459
#> 2007-01-06 50.24433 50.24433 50.11121 50.18112
#> ...                                    
#> 2007-06-26 47.44300 47.61611 47.44300 47.61611
#> 2007-06-27 47.62323 47.71673 47.60015 47.62769
#> 2007-06-28 47.67604 47.70460 47.57241 47.60716
#> 2007-06-29 47.63629 47.77563 47.61733 47.66471
#> 2007-06-30 47.67468 47.94127 47.67468 47.76719

print.zoo(samplezoo)

#>     Open     High     Low      Close   
#>   1 50.03978 50.11778 49.95041 50.11778
#>   2 50.23050 50.42188 50.23050 50.39767
#>   3 50.42096 50.42096 50.26414 50.33236
#>   4 50.37347 50.37347 50.22103 50.33459
#>   5 50.24433 50.24433 50.11121 50.18112
#> ...                                    
#> 176 47.44300 47.61611 47.44300 47.61611
#> 177 47.62323 47.71673 47.60015 47.62769
#> 178 47.67604 47.70460 47.57241 47.60716
#> 179 47.63629 47.77563 47.61733 47.66471
#> 180 47.67468 47.94127 47.67468 47.76719

library("microbenchmark")

x <- microbenchmark(
  zoo_old = invisible(capture.output(zoo:::print.zoo(samplexts))),
  xts_old = invisible(capture.output(xts:::print.xts(samplexts))),
  zoo_new = invisible(capture.output(print.zoo(samplexts))),
  xts_new = invisible(capture.output(print.xts(samplexts))),
  times = 1000
)
summary(x)

#>      expr    min      lq     mean  median      uq     max neval
#> 1 zoo_old 2.3590 2.46380 2.921920 2.59965 2.89375 12.7040  1000
#> 2 xts_old 2.3931 2.50755 2.972585 2.62770 2.92450  8.7730  1000
#> 3 zoo_new 1.7792 1.84510 2.236352 1.92520 2.16320  9.9530  1000
#> 4 xts_new 1.8103 1.88250 2.300003 1.96860 2.23665  9.1413  1000

jangorecki commented 3 years ago

Looks neat

do not break the existing code

You mean you run checks of reverse dependencies (ideally including Suggested revdeps). As this is what CRAN will expect from maintainers of zoo and xts. If it does break any package then probably better to have this as an opt-in feature for at least one release before changing that to default.

markushhh commented 3 years ago

that was misleading. I did not.

Setting options("xts.max.print" = Inf) for a transition should be enough.

joshuaulrich commented 3 years ago

@markushhh, this looks really good! Thanks for all the effort you put into it!

I've been talking with the zoo team about the potential for making this change in xts, and maybe in zoo too. No one is outright opposed, but we want to carefully consider the change. Here are a few things that came up:

The intent behind zoo is to be compatible with ts objects. And xts has the same aim for zoo objects.
What do we do for 1-dimensional zoo objects (i.e. vectors)?
What is the threshold for when the truncation kicks in? I wouldn't want a 15-row object truncated when printing.
There's a potential that this change could break tests that depend on the full output being printed. Reverse dependency checks would find these though, and we could send the authors a patch.
We would need an option to disable the truncation. This would also help people migrate, and we could advise people to set the option to disable the truncation now, before the change is exposed a few releases from now.

zeileis commented 3 years ago

Thanks for the proposed code @markushhh. Thanks for the summary @joshuaulrich.

To expand on 2: I think it would be useful to avoid long printed chunks in the 1-d case as well. However, it is not clear to me what is a good general layout for this. A simple idea would be to print the head, a separate line with the ..., and then the tail:

z <- zoo(sin(1:100), as.Date("2000-01-01") + 0:99)
print1d <- function(x, ...) {
  x <- structure(as.vector(x), .Names = index2char(index(x), frequency = attr(x, "frequency")))
  print(head(x, 5))
  cat("...\n")
  print(tail(x, 5))
}
print1d(z)
## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
## ...
## 2000-04-05 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##  0.9835877  0.3796077 -0.5733819 -0.9992068 -0.5063656

My feeling is, though, that this does not necessarily convey one vector of things and might be confused with the matrix layout.

Another idea would be to print it as one vector of c(head, empty, tail) where the empy element would have a ... index:

print1d <- function(x, ...) {
  x <- structure(format(as.vector(x)), .Names = index2char(index(x), frequency = attr(x, "frequency")))
  print(c(head(x, 5), structure("", .Names = "..."), tail(x, 5)), quote = FALSE)
}
print1d(z)
##   2000-01-01   2000-01-02   2000-01-03   2000-01-04   2000-01-05          ... 
##  0.841470985  0.909297427  0.141120008 -0.756802495 -0.958924275              
##   2000-04-05   2000-04-06   2000-04-07   2000-04-08   2000-04-09 
##  0.983587745  0.379607739 -0.573381872 -0.999206834 -0.506365641

There it's really easy to miss the ... It's a bit better if it's not the end of the line but I'm also not thrilled about it.

options(digits = 4)
print1d(z)
## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05        ... 2000-04-05 
##   0.841471   0.909297   0.141120  -0.756802  -0.958924              0.983588 
## 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##   0.379608  -0.573382  -0.999207  -0.506366

Better ideas?

ggrothendieck commented 3 years ago

print.zoo has a style= argument. This could be an additional style.

> args(zoo:::print.zoo)
function (x, style = ifelse(length(dim(x)) == 0, "horizontal", 
    "vertical"), quote = FALSE, ...)

markushhh commented 3 years ago

@joshuaulrich Thanks for talking to them!

@zeileis Thanks for joining in!

Is there any existing code that tests for the compatibility between the classes?

truncation of vectors is a very good question.

Another possibility would be to print ... at the beginning of the last line to prevent overseeing. But this might introduce asymmetry between the head and tail.

1970-01-02    1970-01-03    1970-01-04    1970-01-05    1970-01-06 
0.0137348254  0.8844110406 -1.5889070092 -1.3828891715  1.2165048537 
1970-01-07    1970-01-08    1970-01-09    1970-01-10    1970-01-11 
-1.6170753365  0.4848673419 -0.1725599031  0.3682548469  0.3236398913 
1970-01-12    1970-01-13    1970-01-14    1970-01-15    1970-01-16 
-0.9045243951 -1.2520928653 -0.0966016999  0.2222901724 -0.5781466642 
...           1970-01-28    1970-01-29    1970-01-30    1970-01-31 
...           0.9102255425  2.3607751726  1.0997868566  0.8708621780

I think printing ... at the beginning and at the end is too much.

1970-01-02    1970-01-03    1970-01-04    1970-01-05    1970-01-06 
 0.0137348254  0.8844110406 -1.5889070092 -1.3828891715  1.2165048537 
 1970-01-07    1970-01-08    1970-01-09    1970-01-10    ... 
-1.6170753365  0.4848673419 -0.1725599031  0.3682548469  ...
 ...           1970-01-28    1970-01-29    1970-01-30    1970-01-31 
 ...           0.9102255425  2.3607751726  1.0997868566  0.8708621780

I'd probably go for @zeileis 's first case where ... between the head and tail. The danger of confusion with matrices only occurs if you don't respect the index.

Threshold

Truncation in other Languages and classes:

Language	Class	Truncation After n-th Row
R	matrix	1000, getOption("max.print")
R	data.frame	1000, getOption("max.print")
R	vector	1000, getOption("max.print")
R	data.table	> 100, getOption("datatable.print.nrows"); prints the column names below the columns if 20 < nrow < 101
R	tibble / tsibble	> 20 getOption("tibble.print_max")
Julia	DataFrame	> 24
Julia	Array	n x 1 Array : > 26 1 x n Array: > 20
Python	pandas.DataFrame	no truncation?

base R truncates the output of vectors based on the number of observations. When max.print is reached it truncates the output and displays additional information

 [ reached getOption("max.print") -- omitted 99000 entries ]

Maybe the settings are arbitrary or it's preference. I don't really mind when it kicks in (as long as it's reasonably long, i.e. <= 100). To be consistent with base R, a vector should be printed horizontally, even though it's column major. Limiting the output to one or two lines is not useful nor appropriate for vectors. IMO default behavior for matrices could be at 50 (somewhat arbitrary!) and for vectors, it depends on the final decision how they are truncated, but in the end, it must be dynamically because the width is not static and depends on the user. Keyword: getOption("width").

I'm going to run reverse dependency checks tonight with the package revdepcheck for xts (fewer dependencies than zoo) with the new printing method to get an overview of how many package tests depend on the output (and how). Is it enough to check for "Depends" and "Includes" or should I check for "Suggests" and "LinkingTo" as well? Bioconductor?
Truncation can be disabled by setting options("zoo.max.print" = Inf) or options("xts.max.print" = Inf), which should the default for (at least) the initial release. I added an argument topn (inspired by data.table) for "head" and "tail".
What about limiting the columns as well? The output for e.g. 10000 columns seems to be completely useless (IMO), in the old and new truncated behavior.

6.1. There's a bug in the code which reduces topn if max.print get's too large, but I'll have a look at that.

I'm currently testing some possible behaviors, e.g.

                 [,1]       [,2]       [,3]           [,6]       [,7]       [,8]
1970-01-02  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
1970-01-03  0.6700347  1.2181748  1.4143326 ... -0.8143729  0.3040398  0.4106147
1970-01-04  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
1970-01-05  0.6700347  1.2181748  1.4143326 ... -0.8143729  0.3040398  0.4106147
1970-01-06  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
       ...        ...        ...        ... ...        ...        ...        ...
1970-02-16 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147
1970-02-17  0.5964707 -0.8898568 -0.9436546 ...  0.5964707 -0.8898568 -0.9436546
1970-02-18 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147
1970-02-19  0.5964707 -0.8898568 -0.9436546 ...  0.5964707 -0.8898568 -0.9436546
1970-02-20 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147

                 [,1]       [,2]       [,3]           [,6]       [,7]       [,8]
1970-01-02  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
1970-01-03  0.6700347  1.2181748  1.4143326     -0.8143729  0.3040398  0.4106147
1970-01-04  1.9587855  0.4649187 -1.5189918      0.5964707 -0.8898568 -0.9436546
1970-01-05  0.6700347  1.2181748  1.4143326     -0.8143729  0.3040398  0.4106147
1970-01-06  1.9587855  0.4649187 -1.5189918      0.5964707 -0.8898568 -0.9436546
...                                                                          ...
1970-02-16 -0.8143729  0.3040398  0.4106147     -0.8143729  0.3040398  0.4106147
1970-02-17  0.5964707 -0.8898568 -0.9436546      0.5964707 -0.8898568 -0.9436546
1970-02-18 -0.8143729  0.3040398  0.4106147     -0.8143729  0.3040398  0.4106147
1970-02-19  0.5964707 -0.8898568 -0.9436546      0.5964707 -0.8898568 -0.9436546
1970-02-20 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147

any idea/advice?

markushhh commented 3 years ago

@ggrothendieck for xts a vector display is useless since there are no vectors in xts. Plain display would be possible though, I don't need it. If it's desired I can implement it. What is the use case of plain? In case the index or coredata is malformed?

markushhh commented 3 years ago

I think following style is a good example where vectors could be mixed up with matrices

## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
##     ...        ...        ...        ...        ...        
## 2000-04-05 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##  0.9835877  0.3796077 -0.5733819 -0.9992068 -0.5063656

ggrothendieck commented 3 years ago

print.zoo is pretty short so if you need clarification see its source. https://github.com/rforge/zoo/blob/master/pkg/zoo/R/zoo.R

markushhh commented 3 years ago

@ggrothendieck Thanks. When do you need the plain style?

markushhh commented 3 years ago

In Julia they don't care about ... being in the middle.

julia> [collect(1000000:10000000)]
1-element Array{Array{Int64,1},1}:
 [1000000, 1000001, 1000002, 1000003, 1000004, 1000005, 1000006, 1000007, 1000008, 1000009  …  9999991, 9999992, 9999993, 9999994, 9999995, 9999996, 9999997, 9999998, 9999999, 10000000]

zeileis commented 3 years ago

Thanks @markushhh for collecting all this information, very useful! Just a couple of comments:

The plain style is mostly used for zero-length series:

zoo()
## Data:
## numeric(0)
## 
## Index:
## integer(0)

What is across the different systems the general preference regarding showing head and tail vs. head only? Both base R and tibble show only the head (albeit the head is allowed to be rather long in base R).
Showing only the head would also facilitate the issue of where to print the ... for 1-d series.
What about adding the information how many elements are omitted and/or how many elements there are overall. Base R only shows the former, tibble shows both.
Limiting the columns as well is a good idea. I like the display with fewer ... better.

braverock commented 3 years ago

What is across the different systems the general preference regarding showing head and tail vs. head only? Both base R and tibble show only the head (albeit the head is allowed to be rather long in base R).

Showing only the head would also facilitate the issue of where to print the ... for 1-d series.

Many time series are "ragged", and several columns will start with NA's. So head and tail has the advantage of showing the most recent data where one will often have a more complete sample.

What about adding the information how many elements are omitted and/or how many elements there are overall. Base R only shows the former, tibble shows both.

I agree this is a good idea for a more informative print method.

Limiting the columns as well is a good idea. I like the display with fewer ... better.

Agreed.

markushhh commented 3 years ago

@zeileis for zero-length series, plain style is in xts already implemented. No need for the extra argument. It's open to discuess whether there's a need for it in zoo. I guess that depends on zoo's dependencies, right?

What about adding the information how many elements are omitted and/or how many elements there are overall. Base R only shows the former, tibble shows both.

I'm down! (printing both)

zeileis commented 3 years ago

Printing dimension: I agree. I also like printing both the overall dimension and the number of elements omitted.

Plain style: zoo always had this argument, not sure who actually uses it (not me). It could be debated whether we should have introduced it or not. But given we have I think we ought to stick to it.

Head only vs. head and tail: Convincing argument by Brian that in time series the tail is typically the most recent information and should be included.

joshuaulrich commented 1 year ago

I've started working on this because I want it. :) I started with @markushhh's implementation (thanks again!). Here's what we still need:

Truncate the number of columns if the result would be > than getOption("width"), and add an argument and option to set it.
Determine how many rows to print before we truncate. I prefer 50 because that works for my screen. But I wouldn't be opposed to 100, like data.table. I think we should use the max argument for this.
Handle the zoo 1-d case.
I'd also like to add a blank line between rows when columns would wrap (when columns > screen width). data.table uses trunc.cols (TRUE/FALSE) for this. I'd like to also support the number of columns too.
Printing dimensions. Not sure how I feel about this. That's something the str() function does.

Did I miss anything? Any other thoughts?

joshuaulrich commented 1 year ago

I also started working on something similar for str.xts(): https://github.com/joshuaulrich/xts/issues/378

I'd appreciate everyone thoughts on that too!

ethanbsmith commented 1 year ago

+1 for leaving index and dim output in str()

joshuaulrich commented 1 year ago

I'm starting to come around to the idea of including them in the print() output too. Still on the fence though... but I just had an idea about how to include them: it could go with the ellipses in the middle. For example:

# zoo 1-d vector

## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
## ... (zoo vector with `n` elements omitted)
## 2000-04-05 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##  0.9835877  0.3796077 -0.5733819 -0.9992068 -0.5063656

# zoo matrix

##            Open     High     Low      Close   
## 2007-01-02 50.03978 50.11778 49.95041 50.11778
## 2007-01-03 50.23050 50.42188 50.23050 50.39767
## 2007-01-04 50.42096 50.42096 50.26414 50.33236
## 2007-01-05 50.37347 50.37347 50.22103 50.33459
## 2007-01-06 50.24433 50.24433 50.11121 50.18112
## ... (zoo matrix with `n` rows omitted)
## 2007-06-26 47.44300 47.61611 47.44300 47.61611
## 2007-06-27 47.62323 47.71673 47.60015 47.62769
## 2007-06-28 47.67604 47.70460 47.57241 47.60716
## 2007-06-29 47.63629 47.77563 47.61733 47.66471
## 2007-06-30 47.67468 47.94127 47.67468 47.76719

joshuaulrich commented 1 year ago

Here's a first draft of printing zoo vectors.

diff --git a/pkg/zoo/R/zoo.R b/pkg/zoo/R/zoo.R
index 39c554b..2ae8224 100644
--- a/pkg/zoo/R/zoo.R
+++ b/pkg/zoo/R/zoo.R
@@ -71,7 +71,39 @@ print.zoo <- function (x, style = ifelse(length(dim(x)) == 0,
     else if (style == "horizontal") {
         y <- as.vector(x)
         names(y) <- index2char(index(x), frequency = attr(x, "frequency"))
-        print(y, quote = quote, ...)
+
+        beg <- NULL
+        end <- NULL
+        n_beg <- 1
+        n_end <- 1
+        while (length(beg) < 3 || length(end) < 3) {
+          if (length(beg) < 3) {
+            beg <- utils::capture.output(print.default(head(y, n_beg)))
+            n_beg <- n_beg + 1
+          }
+          if (length(end) < 3) {
+            end <- utils::capture.output(print.default(tail(y, n_end)))
+            n_end <- n_end + 1
+          }
+        }
+        beg <- utils::capture.output(print.default(head(y, n_beg-2)))
+        end <- utils::capture.output(print.default(tail(y, n_end-2)))
+
+        n_obs <- 1
+        for (i in seq_along(y)) {
+          o <- utils::capture.output(print.default(y[seq_len(i)]))
+          if (length(o) > 2) {
+            # output has wrapped to a new line
+            n_obs <- i - 1
+            break
+          }
+        }
+        o <- utils::capture.output(print.default(head(y, n_obs), quote = quote, ...))
+        p <- utils::capture.output(print.default(tail(y, n_obs), quote = quote, ...))
+        more_rows <- paste0("... zoo vector with ", length(y) - 2*n_obs,
+                            " more observations")
+        z <- matrix(c(o, more_rows, p), ncol = 1)
+        writeLines(z)
     }
     else {
         cat("Data:\n")

And the output is:

R$ z <- zoo(1:100, .Date(1:100))
R$ print(z)
1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 1970-01-07 1970-01-08 1970-01-09 1970-01-10 1970-01-11 
         1          2          3          4          5          6          7          8          9         10 
... zoo vector with 80 more observations
1970-04-02 1970-04-03 1970-04-04 1970-04-05 1970-04-06 1970-04-07 1970-04-08 1970-04-09 1970-04-10 1970-04-11 
        91         92         93         94         95         96         97         98         99        100

zeileis commented 1 year ago

Thanks for having a go at this Josh @joshuaulrich ! Comments:

index2char() names:
Unfortunately, for this application, index2char() internally relies on as.character() rather than format(). My guess is that I didn't know better at the time of writing. But possibly it was also a design decision because index2char() is not only used for printing but also in merge(). In any case, we cannot rely on the names of y having the same number of characters. For Date this is the case, presumably also POSIXt, but not plain numeric. Consider printing: zoo(rep_len(0:9, 1000), 1:1000). The head() uses just 2 lines but the tail 4. I see 3 ways to go: (a) Determine n_obs based on the head rater than the tail. (b) Determine the lengths of head and tail separately. (c) Assure that the names(y) all have the same number of characters, e.g., via
```
names(y) <- format(index2char(index(x), frequency = attr(x, "frequency")), justify = "right")
```
_Difference between n_beg/n_end and n_obs:_
If option (c) is used above, then it is probably enough to use only n_obs and omit the code determining separate n_beg and n_end. In any case, only one of the two approaches seems to be necessary. Question: Is there a particular reason why you use head() and tail() in most places but [seq_len(...)] when determining n_obs?
Inserted line for more rows:
My personal impression would be that "with ... observations omitted" would be clearer than "with ... more observations". In the latter case I found myself wondering whether the "more observations" include those shown at the end, because I was reading top-down. I would also add "..." at the end of the line as well.
Condition for omitting observations:
There should probably be a check whether we need to omit any observations at all. This should be consistent with the matrix printing, e.g., allowing up to a certain number of lines of output. You mention above that you would be ok with up to 50 or even 100 lines. Personally, I would probably prefer less, maybe 20 or 30. But I'm open for discussion here.

In addition with a few further tweaks (naming objects, breaking from the loop, always using quote = quote, ..., etc.), my implementation would be:

        y <- as.vector(x)
        names(y) <- format(index2char(index(x), frequency = attr(x, "frequency")), justify = "right")
        n_tot <- length(y)
        n_obs <- 1L
        if(n_tot > 10L) { ## only consider omitting observations if n_tot > 10 (see below)
          y_head <- utils::capture.output(print.default(y[1L], quote = quote, ...))
          for (i in 2L:n_tot) {
            y_next <- utils::capture.output(print.default(y[1L:i], quote = quote, ...))
            if (length(y_next) > 2L) { ## output has wrapped to a new line
              break
            } else {
              y_head <- y_next
              n_obs <- n_obs + 1L
            }
          }
        }
        if(n_tot > 10L * n_obs) { ## more than 20 lines when fully printed
          y_tail <- utils::capture.output(print.default(y[n_tot - n_obs:1L + 1L], quote = quote, ...))
          y_more <- sprintf("... zoo vector with %s observations omitted ...", n_tot - 2L * n_obs)
          writeLines(c(y_head, y_more, y_tail))
        } else {
          print(y, quote = quote, ...)
        }

joshuaulrich commented 1 year ago

Thanks for having a go at this Josh!

Happy to! I thought it was most efficient to use my knowledge of doing this with print.xts() to give you something to tweak using your knowledge of what zoo needed to do.

If option (c) is used above, then it is probably enough to use only n_obs and omit the code determining separate n_beg and n_end ... Question: Is there a particular reason why you use head() and tail() in most places but [seq_len(...)] when determining n_obs?

Agree about only using n_obs. y[seq_len(i)] most likely came from my copy/paste of the print.xts() code. I doubt there's a good reason to use it other than head/tail. I use head/tail elsewhere because I prefer tail() to y[n:length(y)].

Inserted line for more rows:

Agree with all your comments here.

Condition for omitting observations:

Agreed with allowing a number of observations before truncating. I like 50 lines because that's roughly what fits vertically on my laptop screen. That would be 25 1-d zoo vector observations because there are 2 lines/observation.

I don't have strong feelings about this because changing it later shouldn't be an issue, especially if we provide a global option for users to set their personal preference.

joshuaulrich commented 1 year ago

This is going into the 0.13.0 xts release.

ethanbsmith commented 1 year ago

overall i like this feature and think its a good idea. just one thing i have found a bit frustrating is that head() and tail() no longer work as they used to. i sometimes want to look at a specific set of data, eg: tail(x, 45). however, if the n is less than print's default, the output still gets compressed. there is probably a way to work around this, but im not sure this change in behavior in this scenario is desirable.

joshuaulrich commented 1 year ago

I encountered this too and it needs to be fixed before release. Can you create another issue with a reproducible example for this bug?