msperlin commented 2 years ago

Date accepted: 2022-06-21

Submitting Author Name: Marcelo Perlin Submitting Author Github Handle: !--author1-->@msperlin@melvidoni<!--end-editor-- Reviewers: @s3alfisc, @thisisnic

Due date for @s3alfisc: 2022-05-29 Due date for @thisisnic: 2022-06-13

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: yfR
Title: Downloads and Organizes Financial Data from Yahoo Finance
Version: 0.0.1
Authors@R: person("Marcelo", "Perlin", email = "marceloperlin@gmail.com", role = c("aut", "cre"))
Description: Facilitates download of financial data from Yahoo Finance <https://finance.yahoo.com/>, 
 a vast repository of stock price data across multiple financial exchanges. The package offers a local caching system
 and support for parallel computation.
URL: https://github.com/msperlin/yfR
BugReports: https://github.com/msperlin/yfR/issues
Depends:
    R (>= 4.1)
Imports: stringr, curl, tidyr, 
    lubridate, furrr, purrr, future, tibble, zoo,
    cli, readr, rvest, dplyr, quantmod
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 7.1.2
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    ggplot2,
    covr
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [X] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Package yfR retrieves and organizes data from Yahoo Finance, a large repository for stock price data.

Who is the target audience and what are scientific applications of this package?

Target audience are students, researchers and industry practioneers in the field of Finance and Economics.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Package yfR is the second and backwards-incompatible version of BatchGetSymbols, also developed by me. My plan is to first deprecate BatchGetSymbols and later remove it from CRAN and archive it in Github.

Moreover, there are other packages, such as quantmod, that downloads data from Yahoo Finance, but none with similar features to yfR and BatchGetSymbols.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Yes.

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

Unfortinately, I was not able to run pkgcheck locally as I was unable to install (or make) dependency ctags in my Linux Mint 20.3 machine. Nonetheless, I read through and followed all guidelines available in the manual.

Technical checks

Confirm each of the following by checking the box.

[X] I have read the guide for authors and rOpenSci packaging guide.

This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions, created with roxygen2.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 2 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 2 years ago

:rocket:

The following problem was found in your submission template:

'author1' variable must be GitHub hanle only ('@myhandle') Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

:wave:

ropensci-review-bot commented 2 years ago

Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

ropensci-review-bot commented 2 years ago

Checks for yfR (v0.0.1)

git hash: c345549c

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_multiplication_x: Package has no continuous integration checks.
:heavy_check_mark: Package coverage is 87.8%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 8 files) and - 1 authors - 1 vignette - no internal data file - 14 imported packages - 6 exported functions (median 16 lines of code) - 34 non-exported functions in R (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 8| 50.7| | |files_vignettes | 3| 92.4| | |files_tests | 5| 81.7| | |loc_R | 779| 61.1| | |loc_vignettes | 160| 41.2| | |loc_tests | 184| 53.0| | |num_vignettes | 1| 64.8| | |n_fns_r | 40| 49.3| | |n_fns_r_exported | 6| 29.1| | |n_fns_r_not_exported | 34| 56.6| | |n_fns_per_file_r | 3| 45.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 14| 45.4| | |loc_per_fn_r_exp | 16| 38.0| | |loc_per_fn_r_not_exp | 12| 42.0| | |rel_whitespace_R | 29| 73.7| | |rel_whitespace_vignettes | 65| 65.5| | |rel_whitespace_tests | 56| 72.7| | |doclines_per_fn_exp | 20| 13.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 37| 59.9| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

--- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.78 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- yf_get | 23 yf_get_single_ticker | 22 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 2 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 2

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.96 | |pkgcheck |0.0.2.276 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

mpadge commented 2 years ago

@jooolia The faling check is just because the README does not have a CI badge. @msperlin Could you please add an R CMD check badge to your readme? (We check for CI via badges rather than workflow results, because we do accept submissions from arbitrary code-hosting platforms, not just GitHub.) Thanks!

msperlin commented 2 years ago

Good morning.

Sure, I just added the R-CMD badge.

jooolia commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

ropensci-review-bot commented 2 years ago

Checks for yfR (v0.0.1)

git hash: 1ee2f6f5

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 87.8%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:---------|------:| |internal |base | 69| |internal |yfR | 17| |internal |utils | 3| |imports |dplyr | 11| |imports |purrr | 5| |imports |readr | 5| |imports |stringr | 4| |imports |rvest | 3| |imports |tidyr | 2| |imports |lubridate | 2| |imports |furrr | 2| |imports |future | 2| |imports |tibble | 1| |imports |zoo | 1| |imports |quantmod | 1| |imports |curl | NA| |imports |cli | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |ggplot2 | NA| |suggests |covr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

c (6), file.path (5), as.Date (4), min (4), paste0 (4), data.frame (3), file.exists (3), length (3), list (3), seq (3), as.character (2), as.numeric (2), for (2), max (2), names (2), options (2), rep (2), switch (2), tempdir (2), as.POSIXct (1), class (1), file (1), is.na (1), lapply (1), list.files (1), order (1), seq_along (1), setdiff (1), sum (1), Sys.Date (1), Sys.getenv (1), which (1)

yfR

fix_ticker_name (2), get_morale_boost (2), set_cli_msg (2), yf_get_available_indices (2), calc_ret (1), date_to_unix (1), fct_format_wide (1), unix_to_date (1), yf_get (1), yf_get_available_collections (1), yf_get_ibov_stocks (1), yf_get_index_comp (1), yf_get_single_ticker (1)

dplyr

first (3), bind_rows (2), tibble (2), filter (1), lag (1), mutate (1), rename (1)

purrr

map (2), map_chr (2), pmap (1)

readr

read_rds (4), write_rds (1)

stringr

fixed (1), str_c (1), str_detect (1), str_split (1)

rvest

html_nodes (2), html_table (1)

utils

data (2), capture.output (1)

furrr

furrr_options (2)

future

availableCores (1), plan (1)

lubridate

wday (2)

tidyr

all_of (1), pivot_wider (1)

quantmod

getSymbols (1)

tibble

tibble (1)

zoo

index (1)

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 8 files) and - 1 authors - 1 vignette - no internal data file - 14 imported packages - 6 exported functions (median 16 lines of code) - 34 non-exported functions in R (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 8| 50.7| | |files_vignettes | 3| 92.4| | |files_tests | 5| 81.7| | |loc_R | 779| 61.1| | |loc_vignettes | 160| 41.2| | |loc_tests | 184| 53.0| | |num_vignettes | 1| 64.8| | |n_fns_r | 40| 49.3| | |n_fns_r_exported | 6| 29.1| | |n_fns_r_not_exported | 34| 56.6| | |n_fns_per_file_r | 3| 45.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 14| 45.4| | |loc_per_fn_r_exp | 16| 38.0| | |loc_per_fn_r_not_exp | 12| 42.0| | |rel_whitespace_R | 29| 73.7| | |rel_whitespace_vignettes | 65| 65.5| | |rel_whitespace_tests | 56| 72.7| | |doclines_per_fn_exp | 20| 13.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 37| 59.9| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/msperlin/yfR/workflows/R-CMD-check/badge.svg)](https://github.com/msperlin/yfR/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |1ee2f6 |2022-03-31 | |pkgdown |success |51af0f |2022-03-30 | |R-CMD-check |success |1ee2f6 |2022-03-31 | |render-rmarkdown |failure |f3dbe5 |2022-03-30 | |test-coverage |success |1ee2f6 |2022-03-31 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.78 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- yf_get | 23 yf_get_single_ticker | 22 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 2 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 2

Package Versions

|package |version | |:--------|:-------| |pkgstats |0.0.4.4 | |pkgcheck |0.0.3.6 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jooolia commented 2 years ago

Dear @msperlin, Thank you for your submission. The package has passed all of the automated package checks and the test coverage is good. Could you expand a bit more on how this package differs from quantmod and tidyquant? Thanks, Julia

msperlin commented 2 years ago

Good morning Julia,

The main goal of yfR is to help user download large ammounts of data from Yahoo Finance (YF).

Packages quantmod and tidyquant also offers a function for downloading price data from YF, but only that. Besides importing data, yfR offers the following functionalities:

Organization and clean up of data
- Users can set a threshold for what is "bad" data with respect to matching dates to a benchmark dataset (SP500 is usually used);
- Users can also ask for "complete data", where all missing dates are set as NA for later substitution;
- Log or arithmetic returns, much used in research, are also calculated by default;
- User can aggregate the data to weekly, monthly or yearly, always keeping the same data structure.
smarter downloads
- A local (and smart) session-persistent caching system is implemented. This means that, within a session, the data is never downloaded twice and only missing portions of data are downloaded;
- Support for parallel computing. Users can easily set up concurrent R sessions for faster download of data.
Practicality
- yfR innovates with a "collection" system, where one can easily import a collection of tickers such as the SP500 composition in a single function call.

jooolia commented 2 years ago

Thank you @msperlin, I am discussing with the other editors and will get back to you. Thanks, Julia

jooolia commented 2 years ago

Thanks for your patience @msperlin. The fit seems to be good for us and I am now looking for a handling editor. Thanks, Julia

msperlin commented 2 years ago

Great, thanks @jooolia.

jooolia commented 2 years ago

@ropensci-review-bot assign @melvidoni as editor

ropensci-review-bot commented 2 years ago

Assigned! @melvidoni is now the editor

melvidoni commented 2 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 2 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/523_status.svg)](https://github.com/ropensci/software-review/issues/523)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

msperlin commented 2 years ago

Thanks. The badge is added in dc712f4abac246604721ed7f2926f9794e4e7f99 and the news file already exists.

Athene-ai commented 2 years ago

Hi @melvidoni ! I would like to review this package

melvidoni commented 2 years ago

Hi @melvidoni ! I would like to review this package

Hello @Athene-ai, of course, this package is still needing reviewers. I saw you wrote on several packages, so be mindful that asking in multiple places may not be ideal, as you may end up with more workload than intended. The review timeframe for this is 3 weeks, so if that's okay with you, I'll assign you to this package (and you'll have to complete this review first before accepting any others).

Athene-ai commented 2 years ago

@melvidoni I accept the invitation to review this package within three weeks

melvidoni commented 2 years ago

@ropensci-review-bot assign @Athene-ai as reviewer

ropensci-review-bot commented 2 years ago

@Athene-ai added to the reviewers list. Review due date is 2022-05-26. Thanks @Athene-ai for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@Athene-ai: If you haven't done so, please fill this form for us to update our reviewers records.

Athene-ai commented 2 years ago

@melvidoni thanks for adding me as reviewer and I filled the volunteer form for being an rOpenSci Reviewer :-)

Athene-ai commented 2 years ago

@melvidoni do we have a slack channel?

melvidoni commented 2 years ago

@ropensci-review-bot assign @s3alfisc as reviewer

ropensci-review-bot commented 2 years ago

@s3alfisc added to the reviewers list. Review due date is 2022-05-29. Thanks @s3alfisc for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@s3alfisc: If you haven't done so, please fill this form for us to update our reviewers records.

melvidoni commented 2 years ago

@melvidoni do we have a slack channel?

Hello @Athene-ai. Please, be mindful that responses are not immediate, especially over the weekend; kindly do not hasten people, and wait for responses/actions. There is much going on "behind the scenes" that you may not be aware of.

That said, you'll get an invitation to the Slack later in the process.

Athene-ai commented 2 years ago

@melvidoni do we have a slack channel?

Hello @Athene-ai. Please, be mindful that responses are not immediate, especially over the weekend; kindly do not hasten people, and wait for responses/actions. There is much going on "behind the scenes" that you may not be aware of.

That said, you'll get an invitation to the Slack later in the process.

Thanks for the information 😊

mpadge commented 2 years ago

@Athene-ai Could you please paste a completed review here? Rather than adding more comments to this issue, you may leave that template there for now, and update it with an actual review when you've got that far. It's best to complete the template offline, edit the issue to delete all current content, and then simply paste the completed review back in place of the above comment. Thanks.

melvidoni commented 2 years ago

@ropensci-review-bot remove @Athene-ai from reviewers

ropensci-review-bot commented 2 years ago

@Athene-ai removed from the reviewers list!

melvidoni commented 2 years ago

@msperlin we apologise for the issues caused with the prior reviewer. It has now been removed from the list of reviewers, and I will proceed to search for another reviewer. Please understand that although we try to give everyone an opportunity, sometimes it is not possible to foresee how will they take the opportunity.

I will strive to get a new reviewer, but the person will be given 3 weeks from the acceptance date, hence some delays are bound to happen.

Edit: wrong punctuation, apologies.

msperlin commented 2 years ago

Good morning @melvidoni.

No problem at all. I can wait.

Best,

melvidoni commented 2 years ago

@ropensci-review-bot assign @thisisnic as reviewer

ropensci-review-bot commented 2 years ago

@thisisnic added to the reviewers list. Review due date is 2022-06-13. Thanks @thisisnic for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@thisisnic: If you haven't done so, please fill this form for us to update our reviewers records.

s3alfisc commented 2 years ago

title: “review”

output:

rmarkdown::md_document:

pandoc_args: [

“–wrap=none”

]

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors. None
☒ As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

☐ A statement of need: clearly stating problems the software is designed to solve and its target audience in README. While a readme and pkgdown website exists, I believe that the documentation could be greatly improved - see my comments below.
☒ Installation instructions: for the development version of package and any non-standard dependencies in README
☒ Vignette(s): demonstrating major functionality that runs successfully locally Vignette runs locally.
☒ Function Documentation: for all exported functions
☒ Examples: (that run successfully locally) for all exported functions
☒ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R). All available.

You can find more comments on documentation below.

Functionality

☐ Installation: Installation succeeds as documented on my windows machine. On github actions, the cmd check currently fails for mac.
☒ Functionality: Any functional claims of the software been confirmed.
☒ Performance: Any performance claims of the software been confirmed. There are no performance claims made in the package.
☒ Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. But coverage is below 80% - I would love to see this go above 95% :)
☐ Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
- ☐ package name: maybe a search engine optimized name for the package would be yfinanceR - the name of the equivalent Python package is yfinance
- ☐ create package metadata with codemeta
- ☐ As per the CMD check failure for Mac, the package currently does not run for Mac.
- ☒ functions have descriptive names; snake case; function argument order consistent across functions; no conflicts with base packages
- ☒ console messages via error() and warning()
- ☐ no citation file is included
- ☐ (consider creating a r-universe profile & adding a r-universe badge, r-universe is great :) )
- ☐ add installation instructions using remotes, pak, etc
- ☒ code of conduct, contribution guidelines available
- ☐ the package does not contain top-level documentation - i.e. ??yfR does not return any documentation of the package
- ☐ add a link to the readme with further extensive documentation on `yahoo finance
- ☐ "If your package provides access to a data source, we require that DESCRIPTION contains both (1) A brief identification and/or description of the organisation responsible for issuing data; and (2) The URL linking to public-facing page providing, describing, or enabling data access (which may often differ from URL leading directly to data source)."
- ☒ roxygen2 is used
- ☐ in general, @return statements specify the returned data object. But you could be more specific - usually, tibbles are returned, not base `data.frames
- ☒ @noRD used for non-exported functions
- ☒ pkgdown website exists
- ☒ license: MIT
- ☒ all user facing functions have examples
- ☐ package dependencies: by running pkgstats, it looks like there are multiple package dependencies that you could easily replace by using base functions, e.g. dplyr, magrittr, tibble. What is the advantage of using readr::read_rds() over base::readRDS()? The packages curl, cli, is not detected in use by pkgstats. Nevertheless, all imported packages are of high quality, so I have no concerns here.

Estimated hours spent reviewing: 8

☒ Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.

Additional Comments

I think that yfR is a very promising package with useful features, and I believe that it will be widely used. I very much enjoyed using it! To improve the package, I mostly suggest to invest more time into refining the documentation.

Documentation

Statement of need: I would like to see a more refined statement of need at the beginning of the readme: what is yfR’s main innovation? E.g. start with something like “yfR is an API to yahoo finance. It speeds up the data downloading process by parallel computing and local caching.” Then explain what type of data yahoo finance includes.
I would move the discussion of data quality / limitations of yahoo finance and comparison to BatchGetSymbols to separate articles - I don’t think they are required in the readme.
If you want to keep the reference to quantmod, maybe include a dedicated ‘Acknowledgements’ section at the end of the readme?
Occasionally, you use jargon: e.g., not all users might now what a ticker is.
I would move all examples from the readme to the ‘get started’ vignette. Alternatively, I would keep only one example in the readme.
In the ‘get started’ vignette, I would hide the message output generated e.g. by yf_get() and explain in words what the function does: e.g. it checks the cache, downloads data if the cache is empty, else finishes etc.
The vignette states that multiple ‘collections’ are organized in the package. It would be great to include a full list of collections to the docs, e.g. as a separate article? The yf_get_available_collections() helps here, but what do the individual collections stand for? E.g. does IBOV stand for the Bovespa-Index?
I would like to see some documentation on how the caching works: e.g., where are files saved? For how long are they saved? Is the cache ever cleaned, e.g. are cached files lost by re-starting the R session?
In the docs for yf_convert_to_wide, it would be good to print the initial long dataframe.
The documentation of yf_get() does not really, as a stand-alone, explain what the function does: download ticker data from yahoo finance, caching, parallelism etc. I would delete the reference to getSymbols. Note that as yf_get_default_cache_folder() is not exported, users will run into an error when trying yfr::yf_get_default_cache_folder(). Also, mention that the ticker function argument is vectorized
You could improve the documentation for parallelism: I myself have never used furrr, so your hint to furrr::plan() is not too helpful. How about a dedicated article with a small example that illustrates how to run get_plan() in parallel? Also, I only learned from browsing the code that by default, half of all available cores are used.
What is the difference between a collection and an index?
Consider adding documentation of the data returned via yf_get(). Not being a financial economist, I for example have no idea what the price_adjusted column stands for. Beyond, what is the unit of measurement of the price variables? I suppose it is US Dollars? Further, what is the relationship between daily data and monthly data? Also, potentially add a note that when markets are closed, no data row will be created.
examples could be more 'verbose', i.e. add documentation
also, examples could be more 'exhaustive' - they are quite minimal at the moment
the example for yf_convert_to_wide currently calls internal data - could you not simply attach the data set or load it?

Installation, Local CMD Check & pkgcheck

Installation and CMD check pass without problems
I tried to run pkgcheck, but failed to get it to run. I suggest to run the pkgcheck action on github actions, at least for the time of the review.

Testing

Code Coverage is currently only at around 80% - I would love to see this up at 95%, if not 100 :)

Functionality

All examples work very nicely. Overall, it was a lot of fun using the package!
In general, the console output is very helpful and very pretty!
I am not sure if I would have default function arguments for first_date() and last_date(). If you want to keep it, I would change it from 15 days to one month.
yf_convert_to_wide() is super helpful - great idea to directly include it in the package!
Could the API be more permissive, e.g. accept dates with format dd-mm-yyyy?
When trying the “SP500” collection example, I ran into several ‘error in download’ errors. Still, the function finished eventually with ‘binding price data’. What exactly is going on here? Did the function eventually manage to fetch all tickers? If no, could there be a final message, e.g. ‘300/500 tickers successfully fetched. To fetch all others, do this …’.
I have seen that there is already a PR opened to alert users when they have reached the yahoo finance limit. This is would indeed be a great feature!

Additional Functionality

It would be great to add further collections, e.g. NASDAQ, DAX, SP30, FAANG etc
The equivalent Python package, yfinance offers a range of additional functionality, e.g. data on dividents, stock splits, and institutional investors. Do you plan to incorporate any of these into the package in the future?
Currently, the cached files are saved in the rds file format via readr::read_rds(). There might be faster and/or more memory-friendly alternatives available. Have you considered adding a function argument that would allow users to store files e.g. in the parquet file format?
Have you considered to integrate an autoplot functions to plot stock prices. autoplot would e.g. generate plots similarly to those created in the readme / vignette.
Would it be possible to give an estimate of consumed memory of all cached files prior to a download? I would also consider to export yf_get_default_cache_folder() so that users are aware of the function and can easily check where yfR creates the cache.

Misc

Do you need to export the magrittr pipe when using it internally?
I took a brief glance at the error messages, and most of them are clear and easy to understand. Maybe you could rephrase

  # check for NA
  if (any(is.na(tickers))) {
    my_msg <- paste0(
      "Found NA value in ticker vector.",
      "You need to remove it before running BatchGetSymbols."
    )
    stop(my_msg)
  }

    if (class(first_date) != "Date") {
    stop("ERROR: cant change class of first_date to 'Date'")
  }

In general, I really like the dreamerr package for function input type checks. checkmate seems to be very popular, too.

With dreamerr, you could e.g. write

  # check threshold
  if ((thresh_bad_data < 0) | (thresh_bad_data > 1)) {
    stop("Input thresh_bad_data should be a proportion between 0 and 1")
  }

as

dreamerr::check_arg(thresh_bad_data, "scalar numeric GT{0} LT{1}")

I can’t really follow this error message:

  if (!flag) {
    warning(stringr::str_glue(
      "\nIt seems you are using a non-default cache folder at {cache_folder}. ",
      "Be aware that if any stock event -- split or dividend -- happens ",
      "in between cache files, the resulting aggregate cache data will not ",
      "correspond to reality as some part of the price data will not be ",
      "adjusted to the event. For safety and reproducibility, my suggestion ",
      "is to use cache system only for the current session with tempdir(), ",
      "which is the default option."
    ))
  }

The collections are created via hard coded (wikipedia) URLs. This is likely prone to errors - what if e.g. the URLs change? I understand the attractiveness of this ‘dynamic’ lookup, as e.g. the composition of stock indices might change over time. Maybe you could add a second look-up link (in case the main URL breaks), or you could add a ‘fallback’ data.frame containing the names of all firms included in an index at a fixed date to fall back to? See also this link on potential error handling of URLs via tryCatch.
My last comment (repeating something I mentioned above): the equivalent python package is called yfinance. Maybe a better / SEO optimized name for the package would be yfinanceR?

msperlin commented 2 years ago

Thanks @s3alfisc for the review! Appreciate it. Good ideas there.

I'll reply to all your comments in the next couple of days.

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/523#issuecomment-1140410709 time 8

ropensci-review-bot commented 2 years ago

Logged review for s3alfisc (hours: 8)

msperlin commented 2 years ago

Dear @s3alfisc , please find my replies below:

I think that yfR is a very promising package with useful features, and I believe that it will be widely used. I very much enjoyed using it! To improve the package, I mostly suggest to invest more time into refining the documentation.

Thanks, appreciate the feedback and the detailed review. Given your feedback and ideas, I've made many changes in the code and documentation.

Documentation

Statement of need: I would like to see a more refined statement of need at the beginning of the readme: what is yfR’s main innovation? E.g. start with something like “yfR is an API to yahoo finance. It speeds up the data downloading process by parallel computing and local caching.” Then explain what type of data yahoo finance includes.

Also thanks. I changed the readme.rmd file so that the reader can quickly grasp how to use the package.

I would move the discussion of data quality / limitations of yahoo finance and comparison to BatchGetSymbols to separate articles - I don’t think they are required in the readme. If you want to keep the reference to quantmod, maybe include a dedicated ‘Acknowledgements’ section at the end of the readme? Occasionally, you use jargon: e.g., not all users might now what a ticker is. I would move all examples from the readme to the ‘get started’ vignette. Alternatively, I would keep only one example in the readme.

I reorganized the topics in the readme.rmd and moved some as vignettes.

In the ‘get started’ vignette, I would hide the message output generated e.g. by yf_get() and explain in words what the function does: e.g. it checks the cache, downloads data if the cache is empty, else finishes etc.

I rather keep the yfR messages in the vignettes as they mimic the actual call to the function. I also improved the text in the main vignette ("get started").

The vignette states that multiple ‘collections’ are organized in the package. It would be great to include a full list of collections to the docs, e.g. as a separate article? The yf_get_available_collections() helps here, but what do the individual collections stand for? E.g. does IBOV stand for the Bovespa-Index?

Great idea. I added argument print_description for yf_get_available_collections() for printing a text description of available collections:

I would like to see some documentation on how the caching works: e.g., where are files saved? For how long are they saved? Is the cache ever cleaned, e.g. are cached files lost by re-starting the R session?

I added a section at the help file of yf_get(), explaining how the cache system works.

In the docs for yf_convert_to_wide, it would be good to print the initial long dataframe.

Done.

The documentation of yf_get() does not really, as a stand-alone, explain what the function does: download ticker data from yahoo finance, caching, parallelism etc. I would delete the reference to getSymbols. Note that as yf_get_default_cache_folder() is not exported, users will run into an error when trying yfr::yf_get_default_cache_folder().

Documentation was improved.

Also, mention that the ticker function argument is vectorized

Done.

You could improve the documentation for parallelism: I myself have never used furrr, so your hint to furrr::plan() is not too helpful. How about a dedicated article with a small example that illustrates how to run get_plan() in parallel? Also, I only learned from browsing the code that by default, half of all available cores are used.

I think that going into parallelism and furrr::plan() would be off topic. However, I added a link to furrr https://furrr.futureverse.org/ in argument do_parallel, so that the user can learn more about it, if desired.

What is the difference between a collection and an index?

A collection is just a bunch of tickers put together. An index can be a collection, but not all collections are indices.

Consider adding documentation of the data returned via yf_get(). Not being a financial economist, I for example have no idea what the price_adjusted column stands for. Beyond, what is the unit of measurement of the price variables? I suppose it is US Dollars? Further, what is the relationship between daily data and monthly data? Also, potentially add a note that when markets are closed, no data row will be created.

Done. New documentation is available at readme.rmd and also in help for yf_get().

examples could be more 'verbose', i.e. add documentation also, examples could be more 'exhaustive' - they are quite minimal at the moment the example for yf_convert_to_wide currently calls internal data - could you not simply attach the data set or load it?

I revised all examples, specially for the main function. I've made a few changes, but they look alright to me. Users can always check the vignettes for more details.

Installation, Local CMD Check & pkgcheck

Installation and CMD check pass without problems. I tried to run pkgcheck, but failed to get it to run. I suggest to run the pkgcheck action on github actions, at least for the time of the review.

I also failed to use pkgcheck on linux ubuntu/mint. I can't install its dependencies, despinte spending some time trying hard.

Testing

Code Coverage is currently only at around 80% - I would love to see this up at 95%, if not 100 :)

I tried my best to cover as much as possible, reaching 82,99%. One big miss is in the parallel computing part which, in the current version is not active (I removed it due to YF limits in the api call). There is a fix in course, but it depends on quantmod being in CRAN. I'll add the parallel tests once it is fixed.

The rest is just input error checking which, to me, fells fine to be uncovered (covering them would just be a gimmick). So, I'll not reach 100%, but will be close.

Functionality

All examples work very nicely. Overall, it was a lot of fun using the package! In general, the console output is very helpful and very pretty!

Great, thanks!

I am not sure if I would have default function arguments for first_date() and last_date(). If you want to keep it, I would change it from 15 days to one month.

Done.

yf_convert_to_wide() is super helpful - great idea to directly include it in the package!

Thanks. I know some people use the data that way, even though I dont like it..

Could the API be more permissive, e.g. accept dates with format dd-mm-yyyy?

I feel that ISO format is fine. This is the standard in R and users should probably adapt to it.

When trying the “SP500” collection example, I ran into several ‘error in download’ errors. Still, the function finished eventually with ‘binding price data’. What exactly is going on here? Did the function eventually manage to fetch all tickers? If no, could there be a final message, e.g. ‘300/500 tickers successfully fetched. To fetch all others, do this …’.

Good idea. I implemented the message. The user will now be aware of the relative percentage of tickers in the output data, when comparing to the requested vector of tickers. Whenever that is lower than 50%, a message tells the user to wait for 15 minutes before running it again.

Good

Bad

I have seen that there is already a PR opened to alert users when they have reached the yahoo finance limit. This is would indeed be a great feature!

We are working on this issue, already with a viable solution that should become official soon. Nonetheless, the package works fine in a single session in all my tests.

Additional Functionality

It would be great to add further collections, e.g. NASDAQ, DAX, SP30, FAANG etc

Yes! Definitely. The idea is having something for everyone..

The equivalent Python package, yfinance offers a range of additional functionality, e.g. data on dividents, stock splits, and institutional investors. Do you plan to incorporate any of these into the package in the future?

No. My proposal is focusing on stock data importating and organization, only.

Currently, the cached files are saved in the rds file format via readr::read_rds(). There might be faster and/or more memory-friendly alternatives available. Have you considered adding a function argument that would allow users to store files e.g. in the parquet file format?

I believe that .rds files works fine for yfR (I never saw a performance issue). But I'll keep that in mind. Also, this is very easy to change in the future.

Have you considered to integrate an autoplot functions to plot stock prices. autoplot would e.g. generate plots similarly to those created in the readme / vignette.

No, but I'll also keep it in mind.

Would it be possible to give an estimate of consumed memory of all cached files prior to a download? I would also consider to export yf_get_default_cache_folder() so that users are aware of the function and can easily check where yfR creates the cache.

Probably, but I fell that file size is not really an issue. The cache files are really small.

Nonetheless, I added a "Diagnostics" text at the end of the execution of yf_get. It includes the current size of cache files (see previous figure with output "Diagnostics").

Also, function yf_get_default_cache_folder() is now exported and available to users.

Misc

Do you need to export the magrittr pipe when using it internally?

This was implemented so yfR is compatible with R >= 4.0.0 (personally I preffer the new pipe).

I was not aware that exporting it is unecessary (I simply used usethis::use_pipe() when creating the package). I also feel that no harm is done in allowing the user access to the pipe when loading yfR (I'm not aware of any conflicts).

I took a brief glance at the error messages, and most of them are clear and easy to understand. Maybe you could rephrase

Thanks, I fixed that.

In general, I really like the dreamerr package for function input type checks. checkmate seems to be very popular, too.

Thanks for the suggestion. I was not aware of this package. I'll have a look but, for the time being, I'll stay with the current code.

I can’t really follow this error message: "\nIt seems you are using a non-default cache folder at {cache_folder}. ",

I tried my best, but the explanation is more technical than what I can put in a message. What the user should know is that, for stocks, there is no garantee that cache files can be merged without problems. This happens because external events such as dividends, can alter the adjusted prices recursively. So, you can get a different adjusted price for the same ticker/day if the query is made in different days.

I changed the text so that the explanation is more clear.

The collections are created via hard coded (wikipedia) URLs. This is likely prone to errors - what if e.g. the URLs change? I understand the attractiveness of this ‘dynamic’ lookup, as e.g. the composition of stock indices might change over time. Maybe you could add a second look-up link (in case the main URL breaks), or you could add a ‘fallback’ data.frame containing the names of all firms included in an index at a fixed date to fall back to? See also this link on potential error handling of URLs via tryCatch.

The fallback dataframe is a great idea and I implemented it. I don't like the first one of a "backup" url as requires more webscrapping code, which can be very unstable and hard to maintain.

I also implemented argument force_fallback in yf_get_index_comp, which allows the user to read the offlines files directly.

My last comment (repeating something I mentioned above): the equivalent python package is called yfinance. Maybe a better / SEO optimized name for the package would be yfinanceR?

I really liked the name yfR. Its short and easy to remember. But thanks for the suggestion.

msperlin commented 2 years ago

All changes are in the main branch..

thisisnic commented 2 years ago

I am currently working on my review of this package, and hope to finish it in the next few days if nothing unexpected comes up! I had an issue when I was running the examples in the vignette though, and so to deliver partial feedback which might be useful in the meantime, I've opened this issue relating to it on the project repo: https://github.com/msperlin/yfR/issues/11

melvidoni commented 2 years ago

I am currently working on my review of this package, and hope to finish it in the next few days if nothing unexpected comes up! I had an issue when I was running the examples in the vignette though, and so to deliver partial feedback which might be useful in the meantime, I've opened this issue relating to it on the project repo: msperlin/yfR#11

Hello Nicola, that's great, thank you!

ropensci / software-review

yfR: Downloads and Organizes Financial Data from Yahoo Finance #523

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Checks for yfR (v0.0.1)

1. Statistical Properties

1a. Network visualisation

2. `goodpractice` and other checks

Editor-in-Chief Instructions:

Checks for yfR (v0.0.1)

1. Package Dependencies

2. Statistical Properties

2a. Network visualisation

3. `goodpractice` and other checks

Editor-in-Chief Instructions:

Package Review

Documentation

Functionality

Additional Comments

Documentation

Installation, Local CMD Check & pkgcheck

Testing

Functionality

Additional Functionality

Misc

Documentation

Installation, Local CMD Check & pkgcheck

Testing

Functionality

Additional Functionality

Misc

ropensci / software-review

yfR: Downloads and Organizes Financial Data from Yahoo Finance #523

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Checks for yfR (v0.0.1)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

Editor-in-Chief Instructions:

Checks for yfR (v0.0.1)

1. Package Dependencies

2. Statistical Properties

2a. Network visualisation

3. goodpractice and other checks

Editor-in-Chief Instructions:

Package Review

Documentation

Functionality

Additional Comments

Documentation

Installation, Local CMD Check & pkgcheck

Testing

Functionality

Additional Functionality

Misc

Documentation

Installation, Local CMD Check & pkgcheck

Testing

Functionality

Additional Functionality

Misc

2. `goodpractice` and other checks

3. `goodpractice` and other checks