Closed christophsax closed 1 year ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.
@christophsax Sorry about the mess up here - the checks server is currently running R4.0.3, so doesn't recognise native pipes which you use in some test code. Will upgrade asap and deliver your checks.
But I did't intend to use native pipes, that was an accident. I will remove them right now.
I did so now. Perhaps rerunning is sufficient.
git hash: aad57cea
Important: All failing checks above must be addressed prior to proceeding
Package License: GPL-3
srr
package)This package is in the following category:
:heavy_check_mark: All applicable standards [v0.0.1] have been documented in this package
Click here to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 53 files) and - 1 authors - 3 vignettes - no internal data file - 2 imported packages - 63 exported functions (median 7 lines of code) - 277 non-exported functions in R (median 9 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 53| 96.0| | |files_vignettes | 3| 90.8| | |files_tests | 44| 98.7| | |loc_R | 2649| 88.3| | |loc_vignettes | 344| 80.6| | |loc_tests | 1951| 92.1| | |num_vignettes | 3| 93.1| | |n_fns_r | 340| 93.4| | |n_fns_r_exported | 63| 90.4| | |n_fns_r_not_exported | 277| 93.8| | |n_fns_per_file_r | 4| 51.4| | |num_params_per_fn | 1| 1.1|TRUE | |loc_per_fn_r | 8| 29.0| | |loc_per_fn_r_exp | 7| 14.2| | |loc_per_fn_r_not_exp | 9| 41.9| | |rel_whitespace_R | 25| 91.6| | |rel_whitespace_vignettes | 32| 87.8| | |rel_whitespace_tests | 28| 98.0|TRUE | |doclines_per_fn_exp | 41| 51.1| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 475| 94.6| | ---
Interactive network visualisation of calls between objects in package can be viewed by clicking here
goodpractice
and other checks### 3a. Continuous Integration Badges [![github](https://github.com/christophsax/tsbox/workflows/R-CMD-check/badge.svg)](https://github.com/christophsax/tsbox/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:-----------|:----------|:------|:----------| |R-CMD-check |success |aad57c |2021-09-18 | --- ### 3b. `goodpractice` results ### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following error: 1. checking tests ... Running ‘spelling.R’ Running ‘testthat.R’ ERROR Running the tests in ‘tests/testthat.R’ failed. Last 13 lines of output: █ 1. ├─fl[[i]](x) 2. │ └─tsbox::ts_apply(x, ff, ...) 3. │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...) 4. │ ├─x[, fun(.SD, ...), by = eval(.by)] 5. │ └─data.table:::`[.data.table`(x, , fun(.SD, ...), by = eval(.by)) 6. └─tsbox:::fun(.SD, ...) 7. └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...) 8. ├─seasonal::final(seasonal::seas(x, ...)) 9. └─seasonal::seas(x, ...) 10. └─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE) [ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ] Error: Test failures Execution halted R CMD check generated the following test_fail: 1. > library(testthat) Warning message: package 'testthat' was built under R version 4.1.0 > library(tsbox) > > test_check("tsbox") Starting 2 test processes ══ Skipped tests ═══════════════════════════════════════════════════════════════ • On CRAN (17) ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error (test-units.R:41:5): tsbox works with units ─────────────────────────── Error: Process terminated Backtrace: █ 1. ├─fl[[i]](x) 2. │ └─tsbox::ts_apply(x, ff, ...) 3. │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...) 4. │ ├─x[, fun(.SD, ...), by = eval(.by)] 5. │ └─data.table:::`[.data.table`(x, , fun(.SD, ...), by = eval(.by)) 6. └─tsbox:::fun(.SD, ...) 7. └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...) 8. ├─seasonal::final(seasonal::seas(x, ...)) 9. └─seasonal::seas(x, ...) 10. └─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE) [ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ] Error: Test failures Execution halted R CMD check generated the following check_fail: 1. rcmdcheck_tests_pass ### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed ### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- ts_span | 34 time_shift | 21 ts_plot | 21 copy_class | 19 ### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 178 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 178
:heavy_check_mark: Package has at least one HTML vignette
|package |version | |:--------|:---------| |pkgstats |0.0.0.311 | |pkgcheck |0.0.1.486 | |srr |0.0.1.107 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
There seems to be an issue with the package seasonal on your system. The package is special since it uses x13binary which downloads pre-built binaries from CRAN. Usually this works fine and on most CRAN platforms.
Is there anything special about your system? There was an issue for Mac M1.
If you want to check, this should works:
library(seasonal)
seas(AirPassengers)
checkX13()
But I don't want you to drag into this. I can also remove the few tests that are using seasonal, which should then resolve the issue?
Perhaps this is just because of the failing R CMD check? If not, is there a way to see what ist the measured coverage? I had 81.2% on my system, which passed the threshold (see below).
If I run pkgcheck::pkgcheck(".")
on my side, I get:
#### Coverage
✔ Package uses 'roxygen2'.
✔ Package has a 'contributing.md' file.
✔ Package has a 'CITATION' file.
✔ Package has a 'codemeta.json' file.
✔ All functions have examples.
✔ Package has at least one HTML vignette
✔ Package 'DESCRIPTION' has a URL field.
✔ Package 'DESCRIPTION' has a BugReports field.
✔ Package is already on CRAN.
✔ Package has continuous integration checks.
✔ Package coverage is 81.2%.
✔ R CMD check found no errors.
✔ R CMD check found no warnings.
✔ All applicable standards [v0.0.1] have been documented in this package.
ℹ Current status:
✔ This package may be submitted.
@ropensci-review-bot assign @rkillick as editor
Assigned! @rkillick is now the editor
The goodpractice() checks came back with:
I had problems running covr() but see that it is passing on the checks above. Please fix the above and I'll look to assign reviewers as these are small changes.
Reviewer: @chamberlinc Due date: 2021-11-18
Reviewer: @brunaw Due date: 2021-11-22
@ropensci-review-bot add @chamberlinc to reviewers
I'm sorry @rkillick, I'm afraid I can't do that. That's something only editors are allowed to do.
@rkillick I've now added you to the editors team, sorry about this! You can now repeat the comment and it should work. Thanks for your patience!
@ropensci-review-bot add @chamberlinc to reviewers
@chamberlinc added to the reviewers list. Review due date is 2021-11-23. Thanks @chamberlinc for accepting to review! Please refer to our reviewer guide.
@chamberlinc: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot add @brunaw to reviewers
@brunaw added to the reviewers list. Review due date is 2021-11-23. Thanks @brunaw for accepting to review! Please refer to our reviewer guide.
@brunaw: If you haven't done so, please fill this form for us to update our reviewers records.
@rkillick @christophsax , thank you for the opportunity to review, and my apologies that this has taken so long. Below is my review with some comments. I also have uploaded the knitted html notebook that details the tests I ran at https://github.com/chamberlinc/tsbox-review. The code I used to test the package functions is also available in this repo.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Errors encountered in ts_plot()
, ts_xts()
and ts_long()
. I am including the structure of the data frame I was working with in the code below:
Errors with ts_xts()
and ts_long()
:
> Conn_discharge_DO <- dataRetrieval::readNWISuv(
+ siteNumbers = "01193050",
+ parameterCd = c("00060", "00300"),
+ startDate = "2019-01-01",
+ endDate = "2021-01-01"
+ )
>
> str(Conn_discharge_DO)
'data.frame': 70519 obs. of 8 variables:
$ agency_cd : chr "USGS" "USGS" "USGS" "USGS" ...
$ site_no : chr "01193050" "01193050" "01193050" "01193050" ...
$ dateTime : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
$ X_00060_00000 : num 39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
$ X_00060_00000_cd: chr "A" "A" "A" "A" ...
$ X_00300_00000 : num 13.9 13.9 13.9 14 14 14 14 14 14 14 ...
$ X_00300_00000_cd: chr "A" "A" "A" "A" ...
$ tz_cd : chr "UTC" "UTC" "UTC" "UTC" ...
- attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=01193050&format=waterml,1.1&ParameterCd=00060,00300&startDT=2"| __truncated__
- attr(*, "siteInfo")='data.frame': 1 obs. of 13 variables:
..$ station_nm : chr "CONNECTICUT RIVER AT MIDDLE HADDAM, CT"
..$ site_no : chr "01193050"
..$ agency_cd : chr "USGS"
..$ timeZoneOffset : chr "-05:00"
..$ timeZoneAbbreviation: chr "EST"
..$ dec_lat_va : num 41.5
..$ dec_lon_va : num -72.6
..$ srs : chr "EPSG:4326"
..$ siteTypeCd : chr "ST-TS"
..$ hucCd : chr "01080205"
..$ stateCd : chr "09"
..$ countyCd : chr "09007"
..$ network : chr "NWIS"
- attr(*, "variableInfo")='data.frame': 2 obs. of 7 variables:
..$ variableCode : chr [1:2] "00060" "00300"
..$ variableName : chr [1:2] "Streamflow, ft³/s" "Dissolved oxygen, water, unfiltered, mg/L"
..$ variableDescription: chr [1:2] "Discharge, cubic feet per second" "Dissolved oxygen, water, unfiltered, milligrams per liter"
..$ valueType : chr [1:2] "Derived Value" "Derived Value"
..$ unit : chr [1:2] "ft3/s" "mg/l"
..$ options : chr [1:2] "" ""
..$ noDataValue : logi [1:2] NA NA
- attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
- attr(*, "statisticInfo")='data.frame': 1 obs. of 2 variables:
..$ statisticCd : chr "00000"
..$ statisticName: chr ""
- attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:53:57"
>
> # Try viewing both parameters together
> try(str(ts_xts(Conn_discharge_DO)))
more than one [value] column detected after [time] - using the outermost.
Are you using a wide data frame? To convert, use 'ts_long()'.
[time]: 'dateTime' [value]: 'X_00300_00000'
Error : cannot allocate vector of size 4.5 Gb
> try(str(ts_xts(ts_long(Conn_discharge_DO))))
[id] columns left of [time] column: 'agency_cd', 'site_no'
[time]: 'dateTime'
Error : 'value' column [value] is not numeric.
Errors with ts_plot()
:
> character_date_dat <- data.frame(
+ DateTime = c(
+ "2011-11-11 11:11:11", "2012-12-12 12:12:12", "2021-09-25 20:07:00"
+ ),
+ Value = c(1, 2, 8)
+ )
> try({ts_plot(character_date_dat)})
[time]: 'DateTime' [value]: 'Value'
Error in colnamesInt(x, neworder, check_dups = FALSE) :
argument specifying columns specify non existing column(s): cols[3]='Value'
> # I am downloading hydrological data timeseries from the USGS. This is my most frequent way of accessing timeseries data. Data comes as a data frame.
> Eno_discharge <- dataRetrieval::readNWISuv(
+ siteNumbers = "02085070",
+ parameterCd = "00060",
+ startDate = "2019-01-01",
+ endDate = "2021-01-01"
+ )
> # View the structure of the data frame
> str(Eno_discharge)
'data.frame': 70267 obs. of 6 variables:
$ agency_cd : chr "USGS" "USGS" "USGS" "USGS" ...
$ site_no : chr "02085070" "02085070" "02085070" "02085070" ...
$ dateTime : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
$ X_00060_00000 : num 424 421 424 428 428 428 428 428 435 435 ...
$ X_00060_00000_cd: chr "A" "A" "A" "A" ...
$ tz_cd : chr "UTC" "UTC" "UTC" "UTC" ...
- attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=02085070&format=waterml,1.1&ParameterCd=00060&startDT=2019-01"| __truncated__
- attr(*, "siteInfo")='data.frame': 1 obs. of 13 variables:
..$ station_nm : chr "ENO RIVER NEAR DURHAM, NC"
..$ site_no : chr "02085070"
..$ agency_cd : chr "USGS"
..$ timeZoneOffset : chr "-05:00"
..$ timeZoneAbbreviation: chr "EST"
..$ dec_lat_va : num 36.1
..$ dec_lon_va : num -78.9
..$ srs : chr "EPSG:4326"
..$ siteTypeCd : chr "ST"
..$ hucCd : chr "03020201"
..$ stateCd : chr "37"
..$ countyCd : chr "37063"
..$ network : chr "NWIS"
- attr(*, "variableInfo")='data.frame': 1 obs. of 7 variables:
..$ variableCode : chr "00060"
..$ variableName : chr "Streamflow, ft³/s"
..$ variableDescription: chr "Discharge, cubic feet per second"
..$ valueType : chr "Derived Value"
..$ unit : chr "ft3/s"
..$ options : chr ""
..$ noDataValue : logi NA
- attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
- attr(*, "statisticInfo")='data.frame': 1 obs. of 2 variables:
..$ statisticCd : chr "00000"
..$ statisticName: chr ""
- attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:59:00"
> # Try viewing the discharge data using the ts_plot function
> try({ts_plot(Eno_discharge)})
[time]: 'dateTime' [value]: 'X_00060_00000'
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) :
Items of 'old' not found in column names: [X_00060_00000]. Consider skip_absent=TRUE.
> try({ts_plot(Eno_discharge, skip_absent=TRUE)})
Error in FUN(X[[i]], ...) : ts_boxable(x) is not TRUE
> library(testthat)
> library(tsbox)
>
> test_check("tsbox")
Starting 2 test processes
== Skipped tests ===============================================================
* empty test (1)
== Warnings ====================================================================
-- Warning (test-ts_first_of_period.R:18:3): ts_first_of_period works ----------
no non-missing arguments to min; returning Inf
Backtrace:
1. tsbox::ts_first_of_period(x)
2. tsbox::ts_apply(x, dts_first_of_period)
3. tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
4. tsbox:::fun(x, ...)
8. base::NextMethod("[")
== Failed tests ================================================================
-- Error (test-ts_first_of_period.R:18:3): ts_first_of_period works ------------
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
x
1. \-tsbox::ts_first_of_period(x)
2. \-tsbox::ts_apply(x, dts_first_of_period)
3. \-tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
4. \-tsbox:::fun(x, ...)
5. +-time[(max(which(time <= smry$start)):min(which(time >= smry$end)))]
6. +-base::`[.POSIXct`(...)
7. | \-base::.POSIXct(NextMethod("["), attr(x, "tzone"), oldClass(x))
8. \-base::NextMethod("[")
[ FAIL 1 | WARN 1 | SKIP 1 | PASS 770 ]
Error: Test failures
Execution halted
1 error x | 1 warning x | 0 notes √
Estimated hours spent reviewing: 7
First, I enjoyed the opportunity to review this code. This is the first code review I have done for rOpenSci, and I hope the comments below are helpful.
I like the general purpose of this package and agree there is probably a lot of use for a package that allows easy conversion between R object types. The documentation I think is all ok, though I think it could be improved with a little more description of the expected structure of data. I did not have any issues with installation. There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review). The code snippets in the three vignettes all worked on my machine.
I will preface my comments by saying I am not an experienced user of data.tables, so following the code was a little tricky for me. I usually work with timeseries as data frames, and I was having trouble following the code and figuring out which columns were being assigned as data, timestamps, or identifiers. This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered. When working with data frames that only had one timeseries, the functions all worked simply and, it seemed, as intended.
I was mostly experimenting with the ts_plot()
, ts_long()
, and ts_xts()
functions. ts_xts()
produced an error while working with the data frame that had multiple timeseries. Following the prompting, I tried using ts_long()
, however this also produced an error that I did not understand, because it seemed to be accessing columns that I did not intend. I didn't see any way to input the correct column names to the function. Once I manually pivoted the data frame using pivot_longer()
though, I was able to use ts_xts()
and it performed everything correctly I believe. ts_plot()
produced error messages that it might have mattered how the columns were organized, and that would have been helpful to know. I was not able to get this function to work for several formats of data frame data, even when following the suggested prompts in the error messages.
The code I used to access data and experiment is provided as well at the linked repo. I hope this is helpful!
@chamberlinc thank you for your valuable review of the package.
@brunaw are you able to provide your review soon? It was due on 23 Nov.
@chamberlinc Thank you so much for your review! I will look into it in more detail, but the error messages/feedback in non-standard situations certainly seems worth to be addressed.
First of all, please apologize for my late answer. I originally planned to wait for the second review and lost track afterward.
@chamberlinc, thanks again for your very helpful review! I dealt with your points in PR #211.
Most of your problems arose when applying the functions to non-standard data frames, especially wide ones. In some cases, the messages already pointed to the use use of ts_long()
, but the error messages were not too helpful. In PR #211, I tried to address some of these problems. The error messages are improved, and ts_long() can handle some more complex cases. It works now with a toy example of your data set.
A bug in ts_plot() also caused a weird error message. This bug is now resolved as well.
To sum up, I addressed the following three points from your review:
I think it could be improved with a little more description of the expected structure of data.
Extended paragraph on data frames: https://www.tsbox.help/articles/tsbox.html#time-series-in-data-frames-1
There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review).
The error and the warning did not occur on my system, but the computation in ts_first_of_period()
did not work as expected. I added a new test and a fix. Both are green now on my systems (local and GHA tests in the repository).
This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered.
I improved some of the error messages when functions were applied to wide data structures.
library(nycflights13)
suppressPackageStartupMessages(library(dplyr))
library(tsbox)
packageVersion("tsbox")
#> [1] '0.3.1.9002'
d3 <- weather |>
select(origin, time_hour, temp, humid, precip)
d3
#> # A tibble: 26,115 × 5
#> origin time_hour temp humid precip
#> <chr> <dttm> <dbl> <dbl> <dbl>
#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0
#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0
#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0
#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0
#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0
#> 6 EWR 2013-01-01 06:00:00 37.9 67.2 0
#> 7 EWR 2013-01-01 07:00:00 39.0 64.4 0
#> 8 EWR 2013-01-01 08:00:00 39.9 62.2 0
#> 9 EWR 2013-01-01 09:00:00 39.9 62.2 0
#> 10 EWR 2013-01-01 10:00:00 41 59.6 0
#> # … with 26,105 more rows
This is a wide data frame, so tsbox errors with a meaningful message:
ts_ts(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern
This also works if the order is different
d4 <- weather |>
select(origin, temp, humid, precip, time_hour)
d4
#> # A tibble: 26,115 × 5
#> origin temp humid precip time_hour
#> <chr> <dbl> <dbl> <dbl> <dttm>
#> 1 EWR 39.0 59.4 0 2013-01-01 01:00:00
#> 2 EWR 39.0 61.6 0 2013-01-01 02:00:00
#> 3 EWR 39.0 64.4 0 2013-01-01 03:00:00
#> 4 EWR 39.9 62.2 0 2013-01-01 04:00:00
#> 5 EWR 39.0 64.4 0 2013-01-01 05:00:00
#> 6 EWR 37.9 67.2 0 2013-01-01 06:00:00
#> 7 EWR 39.0 64.4 0 2013-01-01 07:00:00
#> 8 EWR 39.9 62.2 0 2013-01-01 08:00:00
#> 9 EWR 39.9 62.2 0 2013-01-01 09:00:00
#> 10 EWR 41 59.6 0 2013-01-01 10:00:00
#> # … with 26,105 more rows
ts_ts(d4)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern
Using ts_plot()
on these malformed data frames has returned the following nonsensical error:
[time]: 'time_hour' [value]: 'precip'
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) :
Items of 'old' not found in column names: [precip]. Consider skip_absent=TRUE.
>
The skip_absent=TRUE
is a data.table suggestion, not meant for use in tsbox. I fixed this in commit 50e694b8. The message now makes more sense:
ts_plot(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#>
#> [time]: 'time_hour' [value]: 'precip'
#> too many series. Only showing the first 20.
reflecting the fact that we have a large number of absurdly short time series (since value columns were classified as id columns)
I also tried to improve a bit on your examples. I construct a data frame that looks similar to yours:
# > str(Conn_discharge_DO)
# 'data.frame': 70519 obs. of 8 variables:
# $ agency_cd : chr "USGS" "USGS" "USGS" "USGS" ...
# $ site_no : chr "01193050" "01193050" "01193050" "01193050" ...
# $ dateTime : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
# $ X_00060_00000 : num 39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
# $ X_00060_00000_cd: chr "A" "A" "A" "A" ...
Conn_discharge_DO <- tibble(
agency_cd = c("USGS", "USGS", "USGS", "USGS"),
site_no = c("01193050", "01193050", "01193050", "01193050"),
dateTime = as.POSIXct(c("2019-01-01 05:00:00", "2019-01-01 05:15:00", "2019-01-01 05:30:00", "2019-01-01 05:45:00")),
unit = "ft3/s",
X_00060_00000 = 1:4,
noDataValue = NA,
X_00060_000002 = 11:14,
X_00060_00000_cd = "A"
)
tsbox now detects X_00060_00000
as an id column. (As it says in the documentation, it starts on the right). And it suggests using ts_long()
.
ts_xts(Conn_discharge_DO)
#> Found numeric [id] column(s): 'X_00060_00000'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'dateTime' [value]: 'X_00060_000002'
#> Loading required namespace: xts
#> USGS_01193050_ft3/s_1_NA_A USGS_01193050_ft3/s_2_NA_A
#> 2019-01-01 05:00:00 11 NA
#> 2019-01-01 05:15:00 NA 12
#> 2019-01-01 05:30:00 NA NA
#> 2019-01-01 05:45:00 NA NA
#> USGS_01193050_ft3/s_3_NA_A USGS_01193050_ft3/s_4_NA_A
#> 2019-01-01 05:00:00 NA NA
#> 2019-01-01 05:15:00 NA NA
#> 2019-01-01 05:30:00 13 NA
#> 2019-01-01 05:45:00 NA 14
ts_long()
works now on this. It classifies character and factor columns right of the time column as id columns (with a message).
ts_long(Conn_discharge_DO)
#> found columns right to the [time] column that will be treated as [id] columns (character or factor): 'unit', 'X_00060_00000_cd'.
#> Additional [id] column(s): 'agency_cd', 'site_no', 'unit', 'X_00060_00000_cd'
#> [time]: 'dateTime'
#> # A tibble: 12 × 7
#> agency_cd site_no unit X_00060_00000_cd id dateTime value
#> <chr> <chr> <chr> <chr> <chr> <dttm> <int>
#> 1 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:00:00 1
#> 2 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:15:00 2
#> 3 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:30:00 3
#> 4 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:45:00 4
#> 5 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:00:00 NA
#> 6 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:15:00 NA
#> 7 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:30:00 NA
#> 8 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:45:00 NA
#> 9 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:00:00 11
#> 10 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:15:00 12
#> 11 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:30:00 13
#> 12 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:45:00 14
Hello Christoph, Apologies for the delay again. These updates I think are very good. The functions worked as expected this time with the data I typically work with. I had a few automated tests fail this time though, one because I have an older version of R that does not support the |> operator, and the rest all seemingly because of some issue accessing the function ts_dts.
The details of the automated test output you can find here: https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md. The rest of my review is below:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
[ X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[X ] Installation instructions: for the development version of package and any non-standard dependencies in README
[X ] Vignette(s): demonstrating major functionality that runs successfully locally
Caveat: I am running R 4.0.5 on my machine, and so I do not have native pipes. For the tsbox vignette, the code did work, but I first had to replace all instances |> with %>%. I’d recommend using the older syntax to be compatable with more versions of R.
[X ] Function Documentation: for all exported functions
I did not check all of the functions’ documentation, but the subset I checked looked good.
[X ] Examples: (that run successfully locally) for all exported functions
Again, only checked a subset, but all worked.
[X \ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_4.0.5 magrittr_2.0.3 fastmap_1.1.0 cli_3.3.0
## [5] tools_4.0.5 htmltools_0.5.2 rstudioapi_0.13 yaml_2.3.5
## [9] stringi_1.7.6 rmarkdown_2.14 knitr_1.39 stringr_1.4.0
## [13] xfun_0.31 digest_0.6.29 rlang_1.0.2 evaluate_0.15
pkg_dir <- "C:/Users/Cathy/Documents/PeerReviewing/tsbox/R2/tsbox/"
try(devtools::check(pkg_dir))
## i Updating tsbox documentation
## i Loading tsbox
## Error in parse(text = lines, keep.source = TRUE, srcfile = srcfilecopy(srcref_path %||% :
## C:\Users\Cathy\Documents\PeerReviewing\tsbox\R2\tsbox/tests/testthat/test-date_utils.R:9:41: unexpected '>'
## 8:
## 9: x1 <- ts_tbl(ts_c(mdeaths, fdeaths)) |>
## ^
devtools::test(pkg_dir)
## Output removed for clarity. View at https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md ##
Estimated hours spent reviewing: 3
The functions now work as expected for me for the types of data I typically use. Thanks for the updates! I found the messages on the ts_long function to be very helpful.
Very sorry for the slow turnaround.
I changed the native pipe (|>
) back to the magrittr one (%>%
), so this should now work on your system with the latest version (0.3.1.9003
).
I introduced the native pipe deliberately in the vignettes since I always wanted to keep the dependency graph very simple (tsbox only imports data.table and anytime) and wanted to make the vignettes available for non-magrittr/dplyr users. But I see that there is a trade-off between this aim and the support for older versions of R. The native pipe was introduced in R 4.1 on 2021-05-18 and is now available for almost 15 months. With more time passing, only a few older versions will be around. I changed it back for now but will go to the native pipe again at some point in the future.
I have a hard time reproducing them. First, all the tests run through on my system (macOS, R release) and all systems on GitHub Actions (ubuntu-20.04, release, devel), (windows-latest, release), (macOS-latest, release). So I wonder if this is another problem related to the old R version?
Your error says:
Error in `UseMethod("ts_dts")`: no applicable method for 'ts_dts' applied to an object of class "data.frame"
But there clearly is such a method:
If you still encounter the error in the tests, could you try to run one the failing example outside of the tests? Does this fail? It really shouldn't.
library(tsbox)
with_id <- wo_id <- ts_df(mdeaths)
with_id$id <- "mdeaths"
ts_c(wo_id, with_id)
Thank you so much for your work! I guess we are alomost done now. I will answer quickly this time :-)
Hello, and sorry for my slow turn around this time.
I think that plan for the pipes makes the most sense. Unfortunately there are a significant number of users who don't have control over updating their own versions of R because of the IT protocols of their companies or agencies. But, given enough time I agree it will be less of a problem and decreasing dependencies is a good goal.
The error that was occurring previously has resolved itself on my computer. However, I am now getting a new error with the test test-ts_first_of_period
:
Warning (test-ts_first_of_period.R:19:3): ts_first_of_period works
no non-missing arguments to min; returning Inf
Backtrace:
1. tsbox::ts_first_of_period(x)
at test-ts_first_of_period.R:19:2
2. tsbox::ts_apply(x, dts_first_of_period)
at tsbox/R/ts_first_of_period.R:20:2
3. tsbox::ts_apply_dts(ts_dts(x), fun, ...)
at tsbox/R/ts_apply.R:44:2
4. tsbox fun(x, ...)
at tsbox/R/ts_apply.R:18:4
8. base::NextMethod("[")
Error (test-ts_first_of_period.R:19:3): ts_first_of_period works
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
1. tsbox::ts_first_of_period(x)
at test-ts_first_of_period.R:19:2
2. tsbox::ts_apply(x, dts_first_of_period)
at tsbox/R/ts_first_of_period.R:20:2
3. tsbox::ts_apply_dts(ts_dts(x), fun, ...)
at tsbox/R/ts_apply.R:44:2
4. tsbox fun(x, ...)
at tsbox/R/ts_apply.R:18:4
8. base::NextMethod("[")
I tried stepping through the test code outside of the test function, and I got the same error. Going into the dts_first_of_period
function, the error seems to be because which(time >= smry$end))
produces integer(0)
.
Other than that though, everything worked well.
@chamberlinc, thanks a lot!
The error in ts_first_of_period()
was time zone dependent, that's why I haven't spotted it. I added another test and fixed the problem. It should work now with the latest version on GitHub (0.3.1.9004
).
@christophsax , all of the tests pass now, congrats! I was also able to recreate the problem with 'no applicable method' from the last round - when I run the tests while in a project they fail, but running the tests outside of a project they pass. I'm now assuming this was just a problem of working directories somehow.
@chamberlinc Thanks you so much for your help. Sorry it took me so long to respond.
@rkillick I guess we still miss the second review.
Thanks!
Reviewer: @nunesmatt Due date: 2022-10-10
@ropensci-review-bot add @nunesmatt to reviewers
@nunesmatt added to the reviewers list. Review due date is 2022-10-03. Thanks @nunesmatt for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@nunesmatt: If you haven't done so, please fill this form for us to update our reviewers records.
@rkillick @christophsax Thank you for thinking of me to review this package (this is my first ROpenSci review). Please find some comments on the package below. The full script of this review can be found at my github page here.
nunesmatt 30/09/2022
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines. Estimated hours spent reviewing: 7
[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.
Summary
The package aims to supply a structure/class-agnostic approach to time series analysis to circumvent some of the arguably tedious user conversion between classes before analysis tasks. In doing so, the functionality aims at generality – I feel this is a useful package in terms of functionality and scope for a wide range of end-users. The package is well-documented and the three vignettes all run smoothly.
Some comments on the package are below. Note that these are suggestions rather than fixes per se:
dts_first_of_period: no visible binding for global variable ‘time.orig’
## Undefined global functions or variables:
## time.orig
Whilst this is not an error, one suggestion might be to define time.orig
as NULL
(like is done for has.value
) at the start of the
function or before the subsetting in the dts_first_of_period
function (contained within ts_first_of_period.R
).
good practice. A few things came up when using gp
. Many code lines come up as 81 characters (as opposed to suggested lengths of ![\leq 80](https://latex.codecogs.com/png.image?%5Cdpi%7B110%7D&space;%5Cbg_white&space;%5Cleq%2080
"\leq 80")), which might be due to line endings etc. guess_dts.R
has a suggestion of using vapply
rather than sapply
. These are left to the developer to tackle if desired.
Error message about ts_boxable(x)
. I wonder whether a more informative error message could be shown when ts_boxable(x)
is not TRUE
. For example, whilst I realise that the package is designed for existing time series classes, using a tsbox
function on a vector or inappropriate object results in the error, but the user may not know how to fix the issue, especially as ts_boxable()
is primarily internal. Perhaps the message could point to the expected structure of the argument of the main package functions, e.g. “…please ensure the object x
has components…” or “Is the object of class x, y or z?”
Comparison with timetk
. During playing with this package, I mainly focussed on the conversion functions (ts_tbl()
,
ts_long()
,ts_ts()
etc), as, whilst delving into testing datasets, I came across the timetk
package, which at first glance aims to do a similar job to tsbox
.
All functions in tsbox
worked as expected, and some arguably have a more natural output than timetk
(see full review document).
The package developer might want to consider an additional function handling the zooreg
class for completeness.
Whilst users are probably familiar with usage, the developer might want to consider adding examples to show functionality of ts_prcomp
and ts_forecast
under the ts_examples
documentation file.
I noticed that there is a spelling mistake of explicit in the error messages for the guess_time
and guess_value
functions.
@nunesmatt, thank you very much for your helpful review! I will look into the points more carefully and try to address them.
:calendar: @nunesmatt you have 2 days left before the due date for your review (2022-10-03).
📆 @nunesmatt you have 2 days left before the due date for your review (2022-10-03).
Hi,
My review is above (30/09/22), which the package developer has acknowledged and is taking on board my comments.
@nunesmatt apologies, the bot prompted you as I didn't acknowledge your review into the system yet!
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/464#issuecomment-986979096 time 7
Logged review for chamberlinc (hours: 7)
@chamberlinc Thank you for completing the review of this package.
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/464#issuecomment-1263297128 time
I'm sorry human, I don't understand that. You can see what commands I support by typing:
@ropensci-review-bot help
@nunesmatt could you let me know how long you spent on the review, the bot doesn't like me leaving it blank.
Apologies, my review took 6 hours.
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/464#issuecomment-1263297128 time 6
Date accepted: 2023-02-23 Submitting Author Name: Christoph Sax Submitting Author Github Handle: !--author1-->@christophsax<!--end-author1-- Repository: https://github.com/christophsax/tsbox Version submitted: 3.1.9001 Submission type: Stats Badge grade: silver Editor: !--editor-->@rkillick<!--end-editor-- Reviewers: @chamberlinc, @brunaw, @nunesmatt
Due date for @chamberlinc: 2021-11-23 Due date for @brunaw: 2021-11-23 Due date for @nunesmatt: 2022-10-03Archive: TBD Version accepted: TBD
Pre-submission Inquiry
General Information
Anyone who works with time series. Many statistical packages require time series to be in a certain object (ts, xts, tsibble, data.frame). tsbox facilitates the conversion between these objects. It also provides a general toolkit that works the same way with all time series classes. {tsbox} is also mentioned in the rOpenSci Statistical Software Peer Review Section on Time Series.
Paste your responses to our General Standard G1.1 here, describing whether your software is:
In the rOpenSci classification, this package is An improvement on other implementations of similar algorithms in R. Many time series packages, e.g., zoo or tsibble contain converter functions from one class to another. They often convert from their class to
ts
objects and back, but lack converters to other time series class.In most cases, tsbox transforms an object into an augmented
data.table
. And uses thedata.table
infrastructure for efficient joining and reshaping. After computation, it restores the original input class. This restoring feature is was also used in thexts::reclass()
function of the xts package.data.table: For efficient joining and reshaping xts: Similar reclassing mechanism tsibble: Tidy Temporal Data Frames and Tools
Not applicable.
Badging
I probably need some advice on this. I think that tsbox complies with most standards that are applicable. Generality of usage is a particular feature that should qualify the package for silver.
The most outstanding point is probably the one on generality:
The package facilitates work with time series in general. It can also be used to ease the burden of object testing and time series conversion for other time series packages.
Technical checks
Confirm each of the following by checking the box.
autotest
checks on the package, and ensured no tests fail.srr_stats_pre_submit()
function confirms this package may be submitted.This package:
Publication options
[X] Do you intend for this package to go on CRAN?
Already published on CRAN.
[ ] Do you intend for this package to go on Bioconductor?
Code of conduct