joshuaulrich / xts

Extensible time series class that provides uniform handling of many R time series classes by extending zoo.
http://joshuaulrich.github.io/xts/
GNU General Public License v2.0
219 stars 70 forks source link

converting tibble to xts #418

Open linc0380 opened 4 months ago

linc0380 commented 4 months ago

Description

I am trying to convert a tibble to an xts object. The conversion works fine with as.xts(). The behaviour using xts() is unexpected, however.

A dataset in xlsx format is downloaded from the St. Louis Fed data archive. Go to https://fred.stlouisfed.org/graph/?g=1icUr, click on Download, and select Excel. I modified the resulting xlsx sheet by deleting all auxiliary information in rows 1 to 10, renaming the date column date. That file is saved as SP500.xlsx. I am not sure how I can make this file available to save you from replicating these steps.

The xlsx file is imported by means of

data <- read_excel("SP500.xlsx", na = "#N/A")

The resulting structure is

!> str(data)
tibble [2,608 × 2] (S3: tbl_df/tbl/data.frame)
  $ date : POSIXct[1:2608], format: "2014-03-12" "2014-03-13" ...
  $ SP500: num [1:2608] 1868 1846 1841 1859 1872 ...

Expected behavior

The following conversion fine:

data <- read_excel('"SP500.xlsx", na = "#N/A")
str(data)
data
data <- as.xts(data)
str(data)
data

resulting in

> str(data)                                                                                                                                                                                                                                                                  
 An xts object on 2014-03-12 / 2024-03-08 containing:
   Data:    double [2608, 1]
   Columns: SP500
   Index:   POSIXct,POSIXt [2608] (TZ: "UTC")
 > data2                                                                                                                                                                                                                                                                      
              SP500
 2014-03-12 1868.20
 2014-03-13 1846.34
 2014-03-14 1841.13
 2014-03-17 1858.83
 2014-03-18 1872.25
 2014-03-19 1860.77
 2014-03-20 1872.01
 2014-03-21 1866.52
 2014-03-24 1857.44
 2014-03-25 1865.62
        ...
 2024-02-26 5069.53
 2024-02-27 5078.18
 2024-02-28 5069.76
 2024-02-29 5096.27
 2024-03-01 5137.08
 2024-03-04 5130.95
 2024-03-05 5078.65
 2024-03-06 5104.76
 2024-03-07 5157.36
 2024-03-08 5123.69

Minimal, reproducible example

Yet using

data <- read_excel('"SP500.xlsx", na = "#N/A")
str(data)
data
data <- xts(x = data$SP500, order.by = data$date)
str(data)
data

results in an xts object with the same structure yet with output

 > str(data)                                                                                                                                                                                                                                                                  
 An xts object on 2014-03-12 / 2024-03-08 containing:
   Data:    double [2608, 1]
   Index:   POSIXct,POSIXt [2608] (TZ: "UTC")
 > data                                                                                                                                                                                                                                                                      
            m.c.seq.row..seq.n...seq.col..drop...FALSE.
 2014-03-12                                     1868.20
 2014-03-13                                     1846.34
 2014-03-14                                     1841.13
 2014-03-17                                     1858.83
 2014-03-18                                     1872.25
 2014-03-19                                     1860.77
 2014-03-20                                     1872.01
 2014-03-21                                     1866.52
 2014-03-24                                     1857.44
 2014-03-25                                     1865.62
        ...
 2024-02-26                                     5069.53
 2024-02-27                                     5078.18
 2024-02-28                                     5069.76
 2024-02-29                                     5096.27
 2024-03-01                                     5137.08
 2024-03-04                                     5130.95
 2024-03-05                                     5078.65
 2024-03-06                                     5104.76
 2024-03-07                                     5157.36
 2024-03-08                                     5123.69

I am not sure what the first output line means.

Session Info

!> sessionInfo()
 R version 4.3.2 (2023-10-31)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Debian GNU/Linux 11 (bullseye)

 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so;  LAPACK version 3.9.0

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 time zone: Europe/Berlin
 tzcode source: system (glibc)

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] readxl_1.4.2 xts_0.13.1   zoo_1.8-12

 loaded via a namespace (and not attached):
  [1] vctrs_0.6.5      httr_1.4.5       cli_3.6.1        rlang_1.1.0
  [5] stringi_1.7.6    swirl_2.4.5      glue_1.6.2       RCurl_1.98-1.6
  [9] brio_1.1.3       fansi_1.0.4      grid_4.3.2       cellranger_1.1.0
 [13] tibble_3.2.1     bitops_1.0-7     yaml_2.3.5       lifecycle_1.0.3
 [17] stringr_1.5.0    compiler_4.3.2   pkgconfig_2.0.3  testthat_3.1.4
 [21] lattice_0.21-8   digest_0.6.29    R6_2.5.1         utf8_1.2.3
 [25] pillar_1.9.0     magrittr_2.0.3   tools_4.3.2
joshuaulrich commented 4 months ago

For what it's worth, you can get data from FRED in an xts object using quantmod.

data <- quantmod::getSymbols("SP500", src = "FRED", auto.assign = FALSE)

The first line in the output from xts(x = data$SP500, order.by = data$date) is the column name, which is a monstrosity because data$SP500 is a vector and doesn't have a column name. You can use xts(data["SP500"], data$date) if you want the column name to be "SP500".

I can see how that weird column name is confusing, so I'll change the code so the result of xts() doesn't add a column name for a vector input.

linc0380 commented 4 months ago

Thanks for this! Your answer is very illuminating indeed. I didn't realise that data$SP500 returns a simple numeric vector while data["SP500"] preserves the type (and its name!). That a terrific lesson to have learnt today. I wasn't aware of quantmod's capabilities either. Thanks a million for the pointer!