ropensci / software-review

rOpenSci Software Peer Review.
290 stars 104 forks source link

Submission tsbox: Class-Agnostic Time Series #464

Closed christophsax closed 1 year ago

christophsax commented 3 years ago

Date accepted: 2023-02-23 Submitting Author Name: Christoph Sax Submitting Author Github Handle: !--author1-->@christophsax<!--end-author1-- Repository: https://github.com/christophsax/tsbox Version submitted: 3.1.9001 Submission type: Stats Badge grade: silver Editor: !--editor-->@rkillick<!--end-editor-- Reviewers: @chamberlinc, @brunaw, @nunesmatt

Due date for @chamberlinc: 2021-11-23 Due date for @brunaw: 2021-11-23 Due date for @nunesmatt: 2022-10-03

Archive: TBD Version accepted: TBD


Package: tsbox
Type: Package
Title: Class-Agnostic Time Series
Version: 0.3.1.9001
Authors@R: person("Christoph", "Sax", email = "christoph.sax@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-7192-7044"))
Description: Time series toolkit with identical behavior for all
  time series classes: 'ts','xts', 'data.frame', 'data.table', 'tibble', 'zoo',
  'timeSeries', 'tsibble', 'tis' or 'irts'. Also converts reliably between these classes.
Imports:
    data.table (>= 1.10.0),
    anytime
Suggests:
    testthat,
    dplyr,
    tibble,
    tidyr,
    forecast,
    seasonal,
    dygraphs,
    xts,
    ggplot2,
    scales,
    knitr,
    rmarkdown,
    tsibble (>= 0.8.2),
    tsibbledata,
    tibbletime,
    tseries,
    units,
    zoo,
    tis,
    timeSeries,
    nycflights13,
    imputeTS,
    spelling
License: GPL-3
Encoding: UTF-8
URL: https://www.tsbox.help, https://github.com/christophsax/tsbox
BugReports: https://github.com/christophsax/tsbox/issues
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
VignetteBuilder: knitr
Depends:
    R (>= 2.10)
Config/testthat/parallel: true
Config/testthat/edition: 3
Language: en-US

Pre-submission Inquiry

General Information

Anyone who works with time series. Many statistical packages require time series to be in a certain object (ts, xts, tsibble, data.frame). tsbox facilitates the conversion between these objects. It also provides a general toolkit that works the same way with all time series classes. {tsbox} is also mentioned in the rOpenSci Statistical Software Peer Review Section on Time Series.

In the rOpenSci classification, this package is An improvement on other implementations of similar algorithms in R. Many time series packages, e.g., zoo or tsibble contain converter functions from one class to another. They often convert from their class to ts objects and back, but lack converters to other time series class.

In most cases, tsbox transforms an object into an augmented data.table. And uses the data.table infrastructure for efficient joining and reshaping. After computation, it restores the original input class. This restoring feature is was also used in the xts::reclass() function of the xts package.

data.table: For efficient joining and reshaping xts: Similar reclassing mechanism tsibble: Tidy Temporal Data Frames and Tools

Not applicable.

Badging

I probably need some advice on this. I think that tsbox complies with most standards that are applicable. Generality of usage is a particular feature that should qualify the package for silver.

The most outstanding point is probably the one on generality:

Have a demonstrated generality of usage beyond one single envisioned use case.

The package facilitates work with time series in general. It can also be used to ease the burden of object testing and time series conversion for other time series packages.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

Code of conduct

ropensci-review-bot commented 3 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 3 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 3 years ago

Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

mpadge commented 3 years ago

@christophsax Sorry about the mess up here - the checks server is currently running R4.0.3, so doesn't recognise native pipes which you use in some test code. Will upgrade asap and deliver your checks.

christophsax commented 3 years ago

But I did't intend to use native pipes, that was an accident. I will remove them right now.

christophsax commented 3 years ago

I did so now. Perhaps rerunning is sufficient.

ropensci-review-bot commented 3 years ago

Checks for tsbox (v0.3.1.9001)

git hash: aad57cea

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL-3


1. rOpenSci Statistical Standards (srr package)

This package is in the following category:

:heavy_check_mark: All applicable standards [v0.0.1] have been documented in this package

Click here to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 53 files) and - 1 authors - 3 vignettes - no internal data file - 2 imported packages - 63 exported functions (median 7 lines of code) - 277 non-exported functions in R (median 9 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 53| 96.0| | |files_vignettes | 3| 90.8| | |files_tests | 44| 98.7| | |loc_R | 2649| 88.3| | |loc_vignettes | 344| 80.6| | |loc_tests | 1951| 92.1| | |num_vignettes | 3| 93.1| | |n_fns_r | 340| 93.4| | |n_fns_r_exported | 63| 90.4| | |n_fns_r_not_exported | 277| 93.8| | |n_fns_per_file_r | 4| 51.4| | |num_params_per_fn | 1| 1.1|TRUE | |loc_per_fn_r | 8| 29.0| | |loc_per_fn_r_exp | 7| 14.2| | |loc_per_fn_r_not_exp | 9| 41.9| | |rel_whitespace_R | 25| 91.6| | |rel_whitespace_vignettes | 32| 87.8| | |rel_whitespace_tests | 28| 98.0|TRUE | |doclines_per_fn_exp | 41| 51.1| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 475| 94.6| | ---

2a. Network visualisation

Interactive network visualisation of calls between objects in package can be viewed by clicking here


3. goodpractice and other checks

Details of goodpractice and other checks (click to open)

### 3a. Continuous Integration Badges [![github](https://github.com/christophsax/tsbox/workflows/R-CMD-check/badge.svg)](https://github.com/christophsax/tsbox/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:-----------|:----------|:------|:----------| |R-CMD-check |success |aad57c |2021-09-18 | --- ### 3b. `goodpractice` results ### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following error: 1. checking tests ... Running ‘spelling.R’ Running ‘testthat.R’ ERROR Running the tests in ‘tests/testthat.R’ failed. Last 13 lines of output: █ 1. ├─fl[[i]](x) 2. │ └─tsbox::ts_apply(x, ff, ...) 3. │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...) 4. │ ├─x[, fun(.SD, ...), by = eval(.by)] 5. │ └─data.table:::`[.data.table`(x, , fun(.SD, ...), by = eval(.by)) 6. └─tsbox:::fun(.SD, ...) 7. └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...) 8. ├─seasonal::final(seasonal::seas(x, ...)) 9. └─seasonal::seas(x, ...) 10. └─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE) [ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ] Error: Test failures Execution halted R CMD check generated the following test_fail: 1. > library(testthat) Warning message: package 'testthat' was built under R version 4.1.0 > library(tsbox) > > test_check("tsbox") Starting 2 test processes ══ Skipped tests ═══════════════════════════════════════════════════════════════ • On CRAN (17) ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error (test-units.R:41:5): tsbox works with units ─────────────────────────── Error: Process terminated Backtrace: █ 1. ├─fl[[i]](x) 2. │ └─tsbox::ts_apply(x, ff, ...) 3. │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...) 4. │ ├─x[, fun(.SD, ...), by = eval(.by)] 5. │ └─data.table:::`[.data.table`(x, , fun(.SD, ...), by = eval(.by)) 6. └─tsbox:::fun(.SD, ...) 7. └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...) 8. ├─seasonal::final(seasonal::seas(x, ...)) 9. └─seasonal::seas(x, ...) 10. └─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE) [ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ] Error: Test failures Execution halted R CMD check generated the following check_fail: 1. rcmdcheck_tests_pass ### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed ### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- ts_span | 34 time_shift | 21 ts_plot | 21 copy_class | 19 ### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 178 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 178

:heavy_check_mark: Package has at least one HTML vignette


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.0.311 | |pkgcheck |0.0.1.486 | |srr |0.0.1.107 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

christophsax commented 3 years ago

R CMD check

There seems to be an issue with the package seasonal on your system. The package is special since it uses x13binary which downloads pre-built binaries from CRAN. Usually this works fine and on most CRAN platforms.

Is there anything special about your system? There was an issue for Mac M1.

If you want to check, this should works:

library(seasonal)
seas(AirPassengers)
checkX13()

But I don't want you to drag into this. I can also remove the few tests that are using seasonal, which should then resolve the issue?

Coverage

Perhaps this is just because of the failing R CMD check? If not, is there a way to see what ist the measured coverage? I had 81.2% on my system, which passed the threshold (see below).

Report on my side

If I run pkgcheck::pkgcheck(".") on my side, I get:

#### Coverage

✔ Package uses 'roxygen2'.
✔ Package has a 'contributing.md' file.
✔ Package has a 'CITATION' file.
✔ Package has a 'codemeta.json' file.
✔ All functions have examples.
✔ Package has at least one HTML vignette
✔ Package 'DESCRIPTION' has a URL field.
✔ Package 'DESCRIPTION' has a BugReports field.
✔ Package is already on CRAN.
✔ Package has continuous integration checks.
✔ Package coverage is 81.2%.
✔ R CMD check found no errors.
✔ R CMD check found no warnings.
✔ All applicable standards [v0.0.1] have been documented in this package.

ℹ Current status:
✔ This package may be submitted.
noamross commented 2 years ago

@ropensci-review-bot assign @rkillick as editor

ropensci-review-bot commented 2 years ago

Assigned! @rkillick is now the editor

rkillick commented 2 years ago

The goodpractice() checks came back with:

I had problems running covr() but see that it is passing on the checks above. Please fix the above and I'll look to assign reviewers as these are small changes.

rkillick commented 2 years ago

Reviewer: @chamberlinc Due date: 2021-11-18

rkillick commented 2 years ago

Reviewer: @brunaw Due date: 2021-11-22

rkillick commented 2 years ago

@ropensci-review-bot add @chamberlinc to reviewers

ropensci-review-bot commented 2 years ago

I'm sorry @rkillick, I'm afraid I can't do that. That's something only editors are allowed to do.

maelle commented 2 years ago

@rkillick I've now added you to the editors team, sorry about this! You can now repeat the comment and it should work. Thanks for your patience!

rkillick commented 2 years ago

@ropensci-review-bot add @chamberlinc to reviewers

ropensci-review-bot commented 2 years ago

@chamberlinc added to the reviewers list. Review due date is 2021-11-23. Thanks @chamberlinc for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@chamberlinc: If you haven't done so, please fill this form for us to update our reviewers records.

rkillick commented 2 years ago

@ropensci-review-bot add @brunaw to reviewers

ropensci-review-bot commented 2 years ago

@brunaw added to the reviewers list. Review due date is 2021-11-23. Thanks @brunaw for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@brunaw: If you haven't done so, please fill this form for us to update our reviewers records.

chamberlinc commented 2 years ago

@rkillick @christophsax , thank you for the opportunity to review, and my apologies that this has taken so long. Below is my review with some comments. I also have uploaded the knitted html notebook that details the tests I ran at https://github.com/chamberlinc/tsbox-review. The code I used to test the package functions is also available in this repo.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Errors encountered in ts_plot(), ts_xts() and ts_long(). I am including the structure of the data frame I was working with in the code below: Errors with ts_xts() and ts_long():

> Conn_discharge_DO <- dataRetrieval::readNWISuv(
+   siteNumbers = "01193050",
+   parameterCd = c("00060", "00300"),
+   startDate = "2019-01-01",
+   endDate = "2021-01-01"
+ )
> 
> str(Conn_discharge_DO)
'data.frame':   70519 obs. of  8 variables:
 $ agency_cd       : chr  "USGS" "USGS" "USGS" "USGS" ...
 $ site_no         : chr  "01193050" "01193050" "01193050" "01193050" ...
 $ dateTime        : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
 $ X_00060_00000   : num  39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
 $ X_00060_00000_cd: chr  "A" "A" "A" "A" ...
 $ X_00300_00000   : num  13.9 13.9 13.9 14 14 14 14 14 14 14 ...
 $ X_00300_00000_cd: chr  "A" "A" "A" "A" ...
 $ tz_cd           : chr  "UTC" "UTC" "UTC" "UTC" ...
 - attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=01193050&format=waterml,1.1&ParameterCd=00060,00300&startDT=2"| __truncated__
 - attr(*, "siteInfo")='data.frame':    1 obs. of  13 variables:
  ..$ station_nm          : chr "CONNECTICUT RIVER AT MIDDLE HADDAM, CT"
  ..$ site_no             : chr "01193050"
  ..$ agency_cd           : chr "USGS"
  ..$ timeZoneOffset      : chr "-05:00"
  ..$ timeZoneAbbreviation: chr "EST"
  ..$ dec_lat_va          : num 41.5
  ..$ dec_lon_va          : num -72.6
  ..$ srs                 : chr "EPSG:4326"
  ..$ siteTypeCd          : chr "ST-TS"
  ..$ hucCd               : chr "01080205"
  ..$ stateCd             : chr "09"
  ..$ countyCd            : chr "09007"
  ..$ network             : chr "NWIS"
 - attr(*, "variableInfo")='data.frame':    2 obs. of  7 variables:
  ..$ variableCode       : chr [1:2] "00060" "00300"
  ..$ variableName       : chr [1:2] "Streamflow, ft&#179;/s" "Dissolved oxygen, water, unfiltered, mg/L"
  ..$ variableDescription: chr [1:2] "Discharge, cubic feet per second" "Dissolved oxygen, water, unfiltered, milligrams per liter"
  ..$ valueType          : chr [1:2] "Derived Value" "Derived Value"
  ..$ unit               : chr [1:2] "ft3/s" "mg/l"
  ..$ options            : chr [1:2] "" ""
  ..$ noDataValue        : logi [1:2] NA NA
 - attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
 - attr(*, "statisticInfo")='data.frame':   1 obs. of  2 variables:
  ..$ statisticCd  : chr "00000"
  ..$ statisticName: chr ""
 - attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:53:57"
> 
> # Try viewing both parameters together
> try(str(ts_xts(Conn_discharge_DO)))
more than one [value] column detected after [time] - using the outermost.
Are you using a wide data frame? To convert, use 'ts_long()'.

[time]: 'dateTime' [value]: 'X_00300_00000' 
Error : cannot allocate vector of size 4.5 Gb
> try(str(ts_xts(ts_long(Conn_discharge_DO))))
[id] columns left of [time] column: 'agency_cd', 'site_no'
[time]: 'dateTime' 
Error : 'value' column [value] is not numeric.

Errors with ts_plot():

> character_date_dat <- data.frame(
+   DateTime = c(
+   "2011-11-11 11:11:11", "2012-12-12 12:12:12", "2021-09-25 20:07:00"
+   ),
+   Value = c(1, 2, 8)
+ )
> try({ts_plot(character_date_dat)})
[time]: 'DateTime' [value]: 'Value' 
Error in colnamesInt(x, neworder, check_dups = FALSE) : 
  argument specifying columns specify non existing column(s): cols[3]='Value'
> # I am downloading hydrological data timeseries from the USGS. This is my most frequent way of accessing timeseries data. Data comes as a data frame.
> Eno_discharge <- dataRetrieval::readNWISuv(
+   siteNumbers = "02085070",
+   parameterCd = "00060",
+   startDate = "2019-01-01",
+   endDate = "2021-01-01"
+ )
> # View the structure of the data frame
> str(Eno_discharge)
'data.frame':   70267 obs. of  6 variables:
 $ agency_cd       : chr  "USGS" "USGS" "USGS" "USGS" ...
 $ site_no         : chr  "02085070" "02085070" "02085070" "02085070" ...
 $ dateTime        : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
 $ X_00060_00000   : num  424 421 424 428 428 428 428 428 435 435 ...
 $ X_00060_00000_cd: chr  "A" "A" "A" "A" ...
 $ tz_cd           : chr  "UTC" "UTC" "UTC" "UTC" ...
 - attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=02085070&format=waterml,1.1&ParameterCd=00060&startDT=2019-01"| __truncated__
 - attr(*, "siteInfo")='data.frame':    1 obs. of  13 variables:
  ..$ station_nm          : chr "ENO RIVER NEAR DURHAM, NC"
  ..$ site_no             : chr "02085070"
  ..$ agency_cd           : chr "USGS"
  ..$ timeZoneOffset      : chr "-05:00"
  ..$ timeZoneAbbreviation: chr "EST"
  ..$ dec_lat_va          : num 36.1
  ..$ dec_lon_va          : num -78.9
  ..$ srs                 : chr "EPSG:4326"
  ..$ siteTypeCd          : chr "ST"
  ..$ hucCd               : chr "03020201"
  ..$ stateCd             : chr "37"
  ..$ countyCd            : chr "37063"
  ..$ network             : chr "NWIS"
 - attr(*, "variableInfo")='data.frame':    1 obs. of  7 variables:
  ..$ variableCode       : chr "00060"
  ..$ variableName       : chr "Streamflow, ft&#179;/s"
  ..$ variableDescription: chr "Discharge, cubic feet per second"
  ..$ valueType          : chr "Derived Value"
  ..$ unit               : chr "ft3/s"
  ..$ options            : chr ""
  ..$ noDataValue        : logi NA
 - attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
 - attr(*, "statisticInfo")='data.frame':   1 obs. of  2 variables:
  ..$ statisticCd  : chr "00000"
  ..$ statisticName: chr ""
 - attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:59:00"
> # Try viewing the discharge data using the ts_plot function
> try({ts_plot(Eno_discharge)})
[time]: 'dateTime' [value]: 'X_00060_00000' 
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) : 
  Items of 'old' not found in column names: [X_00060_00000]. Consider skip_absent=TRUE.
> try({ts_plot(Eno_discharge, skip_absent=TRUE)})
Error in FUN(X[[i]], ...) : ts_boxable(x) is not TRUE
> library(testthat)
> library(tsbox)
> 
> test_check("tsbox")
Starting 2 test processes
== Skipped tests ===============================================================
* empty test (1)
== Warnings ====================================================================
-- Warning (test-ts_first_of_period.R:18:3): ts_first_of_period works ----------
no non-missing arguments to min; returning Inf
Backtrace:
 1. tsbox::ts_first_of_period(x)
 2. tsbox::ts_apply(x, dts_first_of_period)
 3. tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
 4. tsbox:::fun(x, ...)
 8. base::NextMethod("[")
== Failed tests ================================================================
-- Error (test-ts_first_of_period.R:18:3): ts_first_of_period works ------------
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
    x
 1. \-tsbox::ts_first_of_period(x)
 2.   \-tsbox::ts_apply(x, dts_first_of_period)
 3.     \-tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
 4.       \-tsbox:::fun(x, ...)
 5.         +-time[(max(which(time <= smry$start)):min(which(time >= smry$end)))]
 6.         +-base::`[.POSIXct`(...)
 7.         | \-base::.POSIXct(NextMethod("["), attr(x, "tzone"), oldClass(x))
 8.         \-base::NextMethod("[")
[ FAIL 1 | WARN 1 | SKIP 1 | PASS 770 ]
Error: Test failures
Execution halted
1 error x | 1 warning x | 0 notes √

Estimated hours spent reviewing: 7


Review Comments

First, I enjoyed the opportunity to review this code. This is the first code review I have done for rOpenSci, and I hope the comments below are helpful.

I like the general purpose of this package and agree there is probably a lot of use for a package that allows easy conversion between R object types. The documentation I think is all ok, though I think it could be improved with a little more description of the expected structure of data. I did not have any issues with installation. There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review). The code snippets in the three vignettes all worked on my machine.

I will preface my comments by saying I am not an experienced user of data.tables, so following the code was a little tricky for me. I usually work with timeseries as data frames, and I was having trouble following the code and figuring out which columns were being assigned as data, timestamps, or identifiers. This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered. When working with data frames that only had one timeseries, the functions all worked simply and, it seemed, as intended.

I was mostly experimenting with the ts_plot(), ts_long(), and ts_xts() functions. ts_xts() produced an error while working with the data frame that had multiple timeseries. Following the prompting, I tried using ts_long(), however this also produced an error that I did not understand, because it seemed to be accessing columns that I did not intend. I didn't see any way to input the correct column names to the function. Once I manually pivoted the data frame using pivot_longer() though, I was able to use ts_xts() and it performed everything correctly I believe. ts_plot() produced error messages that it might have mattered how the columns were organized, and that would have been helpful to know. I was not able to get this function to work for several formats of data frame data, even when following the suggested prompts in the error messages.

The code I used to access data and experiment is provided as well at the linked repo. I hope this is helpful!

rkillick commented 2 years ago

@chamberlinc thank you for your valuable review of the package.

rkillick commented 2 years ago

@brunaw are you able to provide your review soon? It was due on 23 Nov.

christophsax commented 2 years ago

@chamberlinc Thank you so much for your review! I will look into it in more detail, but the error messages/feedback in non-standard situations certainly seems worth to be addressed.

christophsax commented 2 years ago

First of all, please apologize for my late answer. I originally planned to wait for the second review and lost track afterward.

@chamberlinc, thanks again for your very helpful review! I dealt with your points in PR #211.

Most of your problems arose when applying the functions to non-standard data frames, especially wide ones. In some cases, the messages already pointed to the use use of ts_long(), but the error messages were not too helpful. In PR #211, I tried to address some of these problems. The error messages are improved, and ts_long() can handle some more complex cases. It works now with a toy example of your data set.

A bug in ts_plot() also caused a weird error message. This bug is now resolved as well.

To sum up, I addressed the following three points from your review:

description of the expected data structure

I think it could be improved with a little more description of the expected structure of data.

Extended paragraph on data frames: https://www.tsbox.help/articles/tsbox.html#time-series-in-data-frames-1

Failing test

There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review).

The error and the warning did not occur on my system, but the computation in ts_first_of_period() did not work as expected. I added a new test and a fix. Both are green now on my systems (local and GHA tests in the repository).

Error Messages on wide data frames

This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered.

I improved some of the error messages when functions were applied to wide data structures.

library(nycflights13)
suppressPackageStartupMessages(library(dplyr))
library(tsbox)
packageVersion("tsbox")
#> [1] '0.3.1.9002'

d3 <- weather |>
  select(origin, time_hour, temp, humid, precip)
d3
#> # A tibble: 26,115 × 5
#>    origin time_hour            temp humid precip
#>    <chr>  <dttm>              <dbl> <dbl>  <dbl>
#>  1 EWR    2013-01-01 01:00:00  39.0  59.4      0
#>  2 EWR    2013-01-01 02:00:00  39.0  61.6      0
#>  3 EWR    2013-01-01 03:00:00  39.0  64.4      0
#>  4 EWR    2013-01-01 04:00:00  39.9  62.2      0
#>  5 EWR    2013-01-01 05:00:00  39.0  64.4      0
#>  6 EWR    2013-01-01 06:00:00  37.9  67.2      0
#>  7 EWR    2013-01-01 07:00:00  39.0  64.4      0
#>  8 EWR    2013-01-01 08:00:00  39.9  62.2      0
#>  9 EWR    2013-01-01 09:00:00  39.9  62.2      0
#> 10 EWR    2013-01-01 10:00:00  41    59.6      0
#> # … with 26,105 more rows

This is a wide data frame, so tsbox errors with a meaningful message:

ts_ts(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern

This also works if the order is different

d4 <- weather |>
  select(origin, temp, humid, precip, time_hour)
d4
#> # A tibble: 26,115 × 5
#>    origin  temp humid precip time_hour          
#>    <chr>  <dbl> <dbl>  <dbl> <dttm>             
#>  1 EWR     39.0  59.4      0 2013-01-01 01:00:00
#>  2 EWR     39.0  61.6      0 2013-01-01 02:00:00
#>  3 EWR     39.0  64.4      0 2013-01-01 03:00:00
#>  4 EWR     39.9  62.2      0 2013-01-01 04:00:00
#>  5 EWR     39.0  64.4      0 2013-01-01 05:00:00
#>  6 EWR     37.9  67.2      0 2013-01-01 06:00:00
#>  7 EWR     39.0  64.4      0 2013-01-01 07:00:00
#>  8 EWR     39.9  62.2      0 2013-01-01 08:00:00
#>  9 EWR     39.9  62.2      0 2013-01-01 09:00:00
#> 10 EWR     41    59.6      0 2013-01-01 10:00:00
#> # … with 26,105 more rows

ts_ts(d4)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern

Using ts_plot() on these malformed data frames has returned the following nonsensical error:

[time]: 'time_hour' [value]: 'precip'
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) :
  Items of 'old' not found in column names: [precip]. Consider skip_absent=TRUE.
>

The skip_absent=TRUE is a data.table suggestion, not meant for use in tsbox. I fixed this in commit 50e694b8. The message now makes more sense:

ts_plot(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> 
#> [time]: 'time_hour' [value]: 'precip' 
#> too many series. Only showing the first 20.

reflecting the fact that we have a large number of absurdly short time series (since value columns were classified as id columns)

I also tried to improve a bit on your examples. I construct a data frame that looks similar to yours:

# > str(Conn_discharge_DO)
# 'data.frame':   70519 obs. of  8 variables:
#  $ agency_cd       : chr  "USGS" "USGS" "USGS" "USGS" ...
#  $ site_no         : chr  "01193050" "01193050" "01193050" "01193050" ...
#  $ dateTime        : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
#  $ X_00060_00000   : num  39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
#  $ X_00060_00000_cd: chr  "A" "A" "A" "A" ...

Conn_discharge_DO <- tibble(
  agency_cd = c("USGS", "USGS", "USGS", "USGS"),
  site_no = c("01193050", "01193050", "01193050", "01193050"),
  dateTime = as.POSIXct(c("2019-01-01 05:00:00", "2019-01-01 05:15:00", "2019-01-01 05:30:00", "2019-01-01 05:45:00")),
  unit = "ft3/s",
  X_00060_00000 = 1:4,
  noDataValue = NA,
  X_00060_000002 = 11:14,
  X_00060_00000_cd = "A"
)

tsbox now detects X_00060_00000 as an id column. (As it says in the documentation, it starts on the right). And it suggests using ts_long().

ts_xts(Conn_discharge_DO)
#> Found numeric [id] column(s): 'X_00060_00000'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'dateTime' [value]: 'X_00060_000002'
#> Loading required namespace: xts
#>                     USGS_01193050_ft3/s_1_NA_A USGS_01193050_ft3/s_2_NA_A
#> 2019-01-01 05:00:00                         11                         NA
#> 2019-01-01 05:15:00                         NA                         12
#> 2019-01-01 05:30:00                         NA                         NA
#> 2019-01-01 05:45:00                         NA                         NA
#>                     USGS_01193050_ft3/s_3_NA_A USGS_01193050_ft3/s_4_NA_A
#> 2019-01-01 05:00:00                         NA                         NA
#> 2019-01-01 05:15:00                         NA                         NA
#> 2019-01-01 05:30:00                         13                         NA
#> 2019-01-01 05:45:00                         NA                         14

ts_long() works now on this. It classifies character and factor columns right of the time column as id columns (with a message).

ts_long(Conn_discharge_DO)
#> found columns right to the [time] column that will be treated as [id] columns (character or factor): 'unit', 'X_00060_00000_cd'.
#> Additional [id] column(s): 'agency_cd', 'site_no', 'unit', 'X_00060_00000_cd'
#> [time]: 'dateTime'
#> # A tibble: 12 × 7
#>    agency_cd site_no  unit  X_00060_00000_cd id        dateTime            value
#>    <chr>     <chr>    <chr> <chr>            <chr>     <dttm>              <int>
#>  1 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:00:00     1
#>  2 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:15:00     2
#>  3 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:30:00     3
#>  4 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:45:00     4
#>  5 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:00:00    NA
#>  6 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:15:00    NA
#>  7 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:30:00    NA
#>  8 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:45:00    NA
#>  9 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:00:00    11
#> 10 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:15:00    12
#> 11 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:30:00    13
#> 12 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:45:00    14
chamberlinc commented 2 years ago

Hello Christoph, Apologies for the delay again. These updates I think are very good. The functions worked as expected this time with the data I typically work with. I had a few automated tests fail this time though, one because I have an older version of R that does not support the |> operator, and the rest all seemingly because of some issue accessing the function ts_dts.

The details of the automated test output you can find here: https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md. The rest of my review is below:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

sessionInfo()

## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.5  magrittr_2.0.3  fastmap_1.1.0   cli_3.3.0      
##  [5] tools_4.0.5     htmltools_0.5.2 rstudioapi_0.13 yaml_2.3.5     
##  [9] stringi_1.7.6   rmarkdown_2.14  knitr_1.39      stringr_1.4.0  
## [13] xfun_0.31       digest_0.6.29   rlang_1.0.2     evaluate_0.15

pkg_dir <- "C:/Users/Cathy/Documents/PeerReviewing/tsbox/R2/tsbox/"
try(devtools::check(pkg_dir))

## i Updating tsbox documentation

## i Loading tsbox

## Error in parse(text = lines, keep.source = TRUE, srcfile = srcfilecopy(srcref_path %||%  : 
##   C:\Users\Cathy\Documents\PeerReviewing\tsbox\R2\tsbox/tests/testthat/test-date_utils.R:9:41: unexpected '>'
## 8: 
## 9:   x1 <- ts_tbl(ts_c(mdeaths, fdeaths)) |>
##                                            ^

devtools::test(pkg_dir)

## Output removed for clarity. View at https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md ##

Estimated hours spent reviewing: 3


Review Comments

The functions now work as expected for me for the types of data I typically use. Thanks for the updates! I found the messages on the ts_long function to be very helpful.

christophsax commented 2 years ago

Very sorry for the slow turnaround.

Native Pipe

I changed the native pipe (|>) back to the magrittr one (%>%), so this should now work on your system with the latest version (0.3.1.9003).

I introduced the native pipe deliberately in the vignettes since I always wanted to keep the dependency graph very simple (tsbox only imports data.table and anytime) and wanted to make the vignettes available for non-magrittr/dplyr users. But I see that there is a trade-off between this aim and the support for older versions of R. The native pipe was introduced in R 4.1 on 2021-05-18 and is now available for almost 15 months. With more time passing, only a few older versions will be around. I changed it back for now but will go to the native pipe again at some point in the future.

Other Test Errors

I have a hard time reproducing them. First, all the tests run through on my system (macOS, R release) and all systems on GitHub Actions (ubuntu-20.04, release, devel), (windows-latest, release), (macOS-latest, release). So I wonder if this is another problem related to the old R version?

Your error says:

 Error in `UseMethod("ts_dts")`: no applicable method for 'ts_dts' applied to an object of class "data.frame"

But there clearly is such a method:

https://github.com/christophsax/tsbox/blob/dd0447c66405da6101419c31b0fe3f0cffdfadc1/R/to_from_data.frame.R#L14-L18

If you still encounter the error in the tests, could you try to run one the failing example outside of the tests? Does this fail? It really shouldn't.

library(tsbox)
with_id <- wo_id <- ts_df(mdeaths)
with_id$id <- "mdeaths"
ts_c(wo_id, with_id)

Thank you so much for your work! I guess we are alomost done now. I will answer quickly this time :-)

chamberlinc commented 2 years ago

Hello, and sorry for my slow turn around this time.

I think that plan for the pipes makes the most sense. Unfortunately there are a significant number of users who don't have control over updating their own versions of R because of the IT protocols of their companies or agencies. But, given enough time I agree it will be less of a problem and decreasing dependencies is a good goal.

The error that was occurring previously has resolved itself on my computer. However, I am now getting a new error with the test test-ts_first_of_period:

Warning (test-ts_first_of_period.R:19:3): ts_first_of_period works
no non-missing arguments to min; returning Inf
Backtrace:
 1. tsbox::ts_first_of_period(x)
      at test-ts_first_of_period.R:19:2
 2. tsbox::ts_apply(x, dts_first_of_period)
      at tsbox/R/ts_first_of_period.R:20:2
 3. tsbox::ts_apply_dts(ts_dts(x), fun, ...)
      at tsbox/R/ts_apply.R:44:2
 4. tsbox fun(x, ...)
      at tsbox/R/ts_apply.R:18:4
 8. base::NextMethod("[")

Error (test-ts_first_of_period.R:19:3): ts_first_of_period works
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
 1. tsbox::ts_first_of_period(x)
      at test-ts_first_of_period.R:19:2
 2. tsbox::ts_apply(x, dts_first_of_period)
      at tsbox/R/ts_first_of_period.R:20:2
 3. tsbox::ts_apply_dts(ts_dts(x), fun, ...)
      at tsbox/R/ts_apply.R:44:2
 4. tsbox fun(x, ...)
      at tsbox/R/ts_apply.R:18:4
 8. base::NextMethod("[")

I tried stepping through the test code outside of the test function, and I got the same error. Going into the dts_first_of_period function, the error seems to be because which(time >= smry$end)) produces integer(0).

Other than that though, everything worked well.

christophsax commented 2 years ago

@chamberlinc, thanks a lot!

The error in ts_first_of_period() was time zone dependent, that's why I haven't spotted it. I added another test and fixed the problem. It should work now with the latest version on GitHub (0.3.1.9004).

chamberlinc commented 2 years ago

@christophsax , all of the tests pass now, congrats! I was also able to recreate the problem with 'no applicable method' from the last round - when I run the tests while in a project they fail, but running the tests outside of a project they pass. I'm now assuming this was just a problem of working directories somehow.

christophsax commented 2 years ago

@chamberlinc Thanks you so much for your help. Sorry it took me so long to respond.

@rkillick I guess we still miss the second review.

Thanks!

rkillick commented 2 years ago

Reviewer: @nunesmatt Due date: 2022-10-10

rkillick commented 2 years ago

@ropensci-review-bot add @nunesmatt to reviewers

ropensci-review-bot commented 2 years ago

@nunesmatt added to the reviewers list. Review due date is 2022-10-03. Thanks @nunesmatt for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 2 years ago

@nunesmatt: If you haven't done so, please fill this form for us to update our reviewers records.

nunesmatt commented 1 year ago

@rkillick @christophsax Thank you for thinking of me to review this package (this is my first ROpenSci review). Please find some comments on the package below. The full script of this review can be found at my github page here.

tsbox review

nunesmatt 30/09/2022

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality


Review Comments

Summary

The package aims to supply a structure/class-agnostic approach to time series analysis to circumvent some of the arguably tedious user conversion between classes before analysis tasks. In doing so, the functionality aims at generality – I feel this is a useful package in terms of functionality and scope for a wide range of end-users. The package is well-documented and the three vignettes all run smoothly.

Some comments on the package are below. Note that these are suggestions rather than fixes per se:

  1. Package checking. When checking the package, I get:
dts_first_of_period: no visible binding for global variable ‘time.orig’
## Undefined global functions or variables:
##   time.orig

Whilst this is not an error, one suggestion might be to define time.orig as NULL (like is done for has.value) at the start of the function or before the subsetting in the dts_first_of_period function (contained within ts_first_of_period.R).

  1. good practice. A few things came up when using gp. Many code lines come up as 81 characters (as opposed to suggested lengths of ![\leq 80](https://latex.codecogs.com/png.image?%5Cdpi%7B110%7D&space;%5Cbg_white&space;%5Cleq%2080 "\leq 80")), which might be due to line endings etc. guess_dts.R has a suggestion of using vapply rather than sapply. These are left to the developer to tackle if desired.

  2. Error message about ts_boxable(x). I wonder whether a more informative error message could be shown when ts_boxable(x) is not TRUE. For example, whilst I realise that the package is designed for existing time series classes, using a tsbox function on a vector or inappropriate object results in the error, but the user may not know how to fix the issue, especially as ts_boxable() is primarily internal. Perhaps the message could point to the expected structure of the argument of the main package functions, e.g. “…please ensure the object x has components…” or “Is the object of class x, y or z?”

  3. Comparison with timetk. During playing with this package, I mainly focussed on the conversion functions (ts_tbl(), ts_long(),ts_ts()etc), as, whilst delving into testing datasets, I came across the timetk package, which at first glance aims to do a similar job to tsbox.

All functions in tsbox worked as expected, and some arguably have a more natural output than timetk (see full review document).

The package developer might want to consider an additional function handling the zooreg class for completeness.

  1. Whilst users are probably familiar with usage, the developer might want to consider adding examples to show functionality of ts_prcomp and ts_forecast under the ts_examples documentation file.

  2. I noticed that there is a spelling mistake of explicit in the error messages for the guess_time and guess_value functions.

christophsax commented 1 year ago

@nunesmatt, thank you very much for your helpful review! I will look into the points more carefully and try to address them.

ropensci-review-bot commented 1 year ago

:calendar: @nunesmatt you have 2 days left before the due date for your review (2022-10-03).

nunesmatt commented 1 year ago

📆 @nunesmatt you have 2 days left before the due date for your review (2022-10-03).

Hi,

My review is above (30/09/22), which the package developer has acknowledged and is taking on board my comments.

rkillick commented 1 year ago

@nunesmatt apologies, the bot prompted you as I didn't acknowledge your review into the system yet!

rkillick commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/464#issuecomment-986979096 time 7

ropensci-review-bot commented 1 year ago

Logged review for chamberlinc (hours: 7)

rkillick commented 1 year ago

@chamberlinc Thank you for completing the review of this package.

rkillick commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/464#issuecomment-1263297128 time

ropensci-review-bot commented 1 year ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@ropensci-review-bot help

rkillick commented 1 year ago

@nunesmatt could you let me know how long you spent on the review, the bot doesn't like me leaving it blank.

nunesmatt commented 1 year ago

Apologies, my review took 6 hours.

rkillick commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/464#issuecomment-1263297128 time 6