tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
530 stars 49 forks source link

Error in yearmonth #290

Closed pgg1309 closed 8 months ago

pgg1309 commented 1 year ago

Hi, There seems to be a bug in date conversions using tsibble::yearmonth(). See the example below.


The error happens even if I use the format option, e.g. yearmonth(d, format = "%YM%m"). This was working in the past. This issue may be related to time zones, but I'm not sure why, since I'm focusing on monthly data.

library(tidyverse)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(reprex)

d <- "1997M01"
x <- yearmonth(d)

print(x)
#> <yearmonth[1]>
#> [1] "1996 Dec"

Created on 2022-11-11 with reprex v2.0.2

pgg1309 commented 1 year ago

This behavior is also having an impact on other functions, as filter_index(). In the example below I get the filtered tsibble starting one month before the month specified in the filter_index() which is clearly wrong.

library(tidyverse)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(reprex)

x <- rnorm(40)
x <- cumsum(x)
t <- make_yearmonth(2001, 1)

y<- tsibble(m = t+0:39, x=x, index = m)
filter_index(y, "2001 Jul" ~.  )
#> # A tsibble: 35 x 2 [1M]
#>           m       x
#>       <mth>   <dbl>
#>  1 2001 Jun  1.23  
#>  2 2001 Jul  0.0794
#>  3 2001 Aug  0.948 
#>  4 2001 Sep  2.00  
#>  5 2001 Oct  0.860 
#>  6 2001 Nov  1.47  
#>  7 2001 Dec -0.187 
#>  8 2002 Jan -1.17  
#>  9 2002 Feb -0.682 
#> 10 2002 Mar -0.910 
#> # … with 25 more rows

Created on 2022-11-15 with reprex v2.0.2

AdamSpannbauer commented 1 year ago

The issue seems specifically related to yearmonth.character() and might be stemming from something with time zones + lubridate::floor_date(). I am able to fix the issue by downgrading lubridate from 1.9.0 to 1.8.0.

Bandaid solution

1) Downgrade lubridate to 1.8.0 using remotes::install_version("lubridate", version = "1.8.0")

2) Manually convert your yearmonth character vector to a Date type before applying yearmonth(). You'll have to use the format argument in a slightly hacky way which will of course defeat some of the convenience purposes of yearmonth()

Example for 1 format

date_str <- "2018 Jan"

# Bad output
tsibble::yearmonth(date_str)
# <yearmonth[1]>
# [1] "2017 Dec"

# Manual conversion and expected output
date <- as.Date(paste(date_str, "01"), format="%Y %b %d")
tsibble::yearmonth(date)
# <yearmonth[1]>
# [1] "2018 Jan"

Example of working around issue by remove timezone info from date before applying yearmonth.Date()

x <- c("2018 Jan", "2018-01", "2018 January")
tsibble::yearmonth(x)
# <yearmonth[3]>
# [1] "2017 Dec" "2017 Dec" "2017 Dec"

# Modified guts of tsibble:::yearmonth.character()
# to NULL out tzone attribute
fmts <- c("%B %Y", "%b %Y", "%Y M%m", "%Y m%m")
anytime::addFormats(fmts)

date_x <- anytime::anydate(x)
attributes(date_x)$tzone <- NULL
tsibble::yearmonth(date_x)
# <yearmonth[3]>
# [1] "2018 Jan" "2018 Jan" "2018 Jan"

anytime::removeFormats(fmts)

Version info

# I've seen this same issue with 1.1.3 as well and then downgraded
packageVersion('tsibble')
# [1] '1.1.1'

packageVersion('anytime')
# [1] '0.3.9'
AdamSpannbauer commented 1 year ago

I believe this issue is caused by an issue in the timechange package that lubridate relies on, and will be solved if this PR (https://github.com/vspinu/timechange/pull/24) is merged. It is an issue related to timezones being inconsistently assumed to be UTC.

earowang commented 8 months ago

looks like it gets fixed in upstream packages https://github.com/vspinu/timechange/pull/24 thanks for the help!