Closed ocallaghanm closed 1 year ago
Thanks for the report and reproducible example!
I can't replicate this behavior. My hunch is that it's a timezone issue. Can you provide the output of:
names(foo)
## [1] "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" "May 2007" "Jun 2007"
names(bar)
## [1] "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" "May 2007" "Jun 2007"
names(baz)
## [1] "Sep 2021" "Oct 2021" "Nov 2021"
names(qux)
## [1] "Sep 2021" "Sep 2021" "Nov 2021"
Sys.getenv("TZ") # "" for me
Sys.timezone() # "America/Chicago" for me
Thanks for your prompt reply! That might be it indeed, though I don't understand why it would cause the names to repeat themselves... Here is the desired info:
names(foo)
[1] "janv. 2007" "janv. 2007" "févr. 2007" "mars 2007" "avr. 2007" "mai 2007"
names(bar)
[1] "janv. 2007" "janv. 2007" "févr. 2007" "mars 2007" "avr. 2007" "mai 2007"
names(baz)
[1] "sept. 2021" "oct. 2021" "nov. 2021"
names(qux)
[1] "sept. 2021" "sept. 2021" "nov. 2021"
Sys.getenv("TZ")
[1] ""
Sys.timezone()
[1] "Europe/Berlin"
Edit: I just tested setting tz = "UTC"
in the declaration of dat3
, and re-running from there. In that case, names(qux)
yields "sept. 2021" "oct. 2021" "nov.2021"
, i.e. as expected. So you must be right. But it seems a bit of a dirty trick to pass everything off as UTC, do you have an idea how to work around this?
Thanks for the info. I can replicate if I set Sys.setenv(TZ = "Europe/Berlin")
. I'll investigate further.
Sys.setenv(TZ = "Europe/Berlin")
library("xts")
data("sample_matrix")
dat <- as.xts(sample_matrix)
foo <- split(dat, f = "months")
names(foo)
## [1] "Jan 2007" "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" "May 2007"
This happens with sample_matrix
because as.xts()
creates an xts object with a POSIXct index, and uses Sys.getenv("TZ")
as the timezone by default. You can force a Date index by setting dateFormat = "Date"
in the call to as.xts()
. This is related to #192.
Sys.setenv(TZ = "Europe/Berlin")
library("xts")
data("sample_matrix")
dat <- as.xts(sample_matrix, dateFormat = "Date")
foo <- split(dat, f = "months")
names(foo)
## [1] "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" "May 2007" "Jun 2007"
Something else is going on with your use case. I'm looking into that.
EDIT: the issue in your actual case is that as.yearmon.POSIXct()
always sets tz = "GMT"
in its call to as.POSIXlt()
. That converts your Europe/Berlin times to GMT before converting them to a yearmon object. So sometimes the Europe/Berlin GMT offset causes that to be the prior day in GMT.
For example:
x <- structure(c(1632481200, 1633042800, 1635724800),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/Berlin")
x
## [1] "2021-09-24 13:00:00 CEST" "2021-10-01 01:00:00 CEST" "2021-11-01 01:00:00 CET"
as.yearmon(x)
## [1] "Sep 2021" "Sep 2021" "Nov 2021"
# as.yearmon.POSIXt() calls as.yearmon(with(as.POSIXlt(x, tz = "GMT"), 1900 + year + mon/12))
as.yearmon(with(as.POSIXlt(x, tz = "GMT"), 1900 + year + mon/12))
## [1] "Sep 2021" "Sep 2021" "Nov 2021"
# setting tz in as.POSIXlt() gives the correct answer
as.yearmon(with(as.POSIXlt(x, tz = tzone(x)), 1900 + year + mon/12))
## [1] "Sep 2021" "Oct 2021" "Nov 2021"
# as.Date() has the same behavior and solution
as.Date(x)
## [1] "2021-09-24" "2021-09-30" "2021-11-01"
as.Date(x, tz = tzone(x))
## [1] "2021-09-24" "2021-10-01" "2021-11-01"
@zeileis and @ggrothendieck cc'ing you to make sure you're aware of this behavior in as.yearmon()
so you can decide whether or not it's desired.
@ocallaghanm I'll make split.xts()
more careful about converting the index times into names for the result. Thanks again for the report!
Thanks a bunch!
Description
I have been trying to
split.xts()
my dataset by month. The splits are correct in terms of endpoints, but the nomenclature of the elements in the returned list isn't. In some cases, one month appears twice in a row (but with the data for the correct month). I updated my version ofxts
, as well as R itself, but nothing seems to be changing. The issue is not systematic, however.Expected behavior
In the code below, I use a full-scale (testdat_large.csv) and a reduced (testdat.csv) version of my original data.
baz
, from the reduced data, has 3 elements Sep 2021, Oct 2021 and Nov 2021. So far, so good. Butqux
has Sep 2021, Sep 2021 and Nov 2021 - but the splits themselves are correct.Thinking it might be a problem with my data, I used
sample_matrix
. The resultingfoo
however, has Jan 2007, Jan 2007, Feb 2007, Mar 2007, Apr 2007, May 2007 instead of the expected Jan through Jun. But again, the endpoints are correct (e.g. the second instance of Jan 2007 has 28 days so is indeed February).I thought it might be due to the fact that these datasets all have multiple columns, so I tried subsetting one in
dat.sing
. But the resultingbar
presents the exact same problem. So the issue seems to be that for some reason, some list elements receive the wrong name, and subsequent elements are offset.Minimal, reproducible example
Session Info