Closed mbannert closed 6 years ago
I absolutely agree.
I don't think we have immediate cause for concern as the data don't get touched after being written to the DB so in practice the order is preserved. That does not mean that this will remain to be the case forever though.
I don't have a working copy right now but maybe we can do something with these d_chars
# R internals :)
# only convert the first element to date cause this is costly for the
# entire vector !! the character vector (d_chars) is sorted, too,
# which is all we need for zoo !!!
d <- as.Date(d_chars[1])
y <- as.numeric(format(d,"%Y"))
p <- as.numeric(format(d,"%m"))
Not sure if we can properly sort them without converting to Date
though.
A heavier approach would be to change how we represent the data in the database.
Since we work with ts
exclusively (although there is also code concerning irregular time series in readTimeSeries
?) we could store them something like this:
{
"start": 1988.75,
"frequency": 12,
"data": [1, 2, 3, null, 4, ...]
}
either as a JSON string or postgres JSON. That would preserve the order.
Actually, looks like order
works just fine on date strings (as long as they are properly zero-padded, I assume):
dates <- c("2017-01-01", "2017-02-01", "2017-03-01", "2017-04-01")
order(dates)
dates[order(dates)]
dates2 <- c(dates[2], dates[1], dates[3:4])
dates2[order(dates2)]
> dates <- c("2017-01-01", "2017-02-01", "2017-03-01", "2017-04-01")
> order(dates)
[1] 1 2 3 4
> dates[order(dates)]
[1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01"
> dates2 <- c(dates[2], dates[1], dates[3:4])
> dates2[order(dates2)]
[1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01"
by date string
you mean standard-format-date-but-still-character
? If we're sure that everything is covered this is perhaps a good option. I just wonder whether order is expensive. Plus, given that there's no immediate need to react now, we should maybe rather think of a more comprehensive overhaul. Nevertheless I would like to bring this version to CRAN soon.
Exactly.
Ordering 6012 such strings:
> dates <- seq(as.Date("2000-01-01"), as.Date("2500-12-31"), by = "1 month")
> dates <- as.character(dates)
> dates <- dates[sample(length(dates))]
> microbenchmark(order(dates), times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
order(dates) 55.54707 56.8505 60.62766 61.12514 63.68828 67.09772 10
Ordering 300 dates 100 times, which is probably closer to our scenarios:
> microbenchmark({for(i in 1:100) { order(dates[1:300]) }}, times = 10)
Unit: milliseconds
expr min lq mean median uq
{ for (i in 1:100) { order(dates[1:300]) } } 122.043 127.8955 128.4337 129.3276 130.2437
max neval
133.1075 10
Fun fact as an aside: Looks like Dates
do not support 5 digit years yet. Sloppy future-proofing...
closed for now.
@HomoCodens I think we need to pay close attention to this discussion, I just started:
https://stackoverflow.com/questions/50287646/how-to-reproduce-contingency-of-order-in-hstore
I am afraid we do not do enough to ensure the correct order of our R objects. I am not aware of any issues with order problems and also our unit tests pass, but still it makes me feel uneasy.
Thinking about introducing and additional sort here... but don't want to because of all the order and type cast costs...
From readTimeSeries: