ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

yfR: Downloads and Organizes Financial Data from Yahoo Finance #523

Closed msperlin closed 2 years ago

msperlin commented 2 years ago

Date accepted: 2022-06-21

Submitting Author Name: Marcelo Perlin Submitting Author Github Handle: !--author1-->@msperlin<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/msperlin/yfR Version submitted: 0.0.1 Submission type: Standard Editor: !--editor-->@melvidoni<!--end-editor-- Reviewers: @s3alfisc, @thisisnic

Due date for @s3alfisc: 2022-05-29 Due date for @thisisnic: 2022-06-13

Archive: TBD Version accepted: TBD Language: en

Package: yfR
Title: Downloads and Organizes Financial Data from Yahoo Finance
Version: 0.0.1
Authors@R: person("Marcelo", "Perlin", email = "marceloperlin@gmail.com", role = c("aut", "cre"))
Description: Facilitates download of financial data from Yahoo Finance <https://finance.yahoo.com/>, 
 a vast repository of stock price data across multiple financial exchanges. The package offers a local caching system
 and support for parallel computation.
URL: https://github.com/msperlin/yfR
BugReports: https://github.com/msperlin/yfR/issues
Depends:
    R (>= 4.1)
Imports: stringr, curl, tidyr, 
    lubridate, furrr, purrr, future, tibble, zoo,
    cli, readr, rvest, dplyr, quantmod
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 7.1.2
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    ggplot2,
    covr
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

Package yfR retrieves and organizes data from Yahoo Finance, a large repository for stock price data.

Target audience are students, researchers and industry practioneers in the field of Finance and Economics.

Package yfR is the second and backwards-incompatible version of BatchGetSymbols, also developed by me. My plan is to first deprecate BatchGetSymbols and later remove it from CRAN and archive it in Github.

Moreover, there are other packages, such as quantmod, that downloads data from Yahoo Finance, but none with similar features to yfR and BatchGetSymbols.

Yes.

Unfortinately, I was not able to run pkgcheck locally as I was unable to install (or make) dependency ctags in my Linux Mint 20.3 machine. Nonetheless, I read through and followed all guidelines available in the manual.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 2 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 2 years ago

:rocket:

The following problem was found in your submission template:

:wave:

ropensci-review-bot commented 2 years ago

Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

ropensci-review-bot commented 2 years ago

Checks for yfR (v0.0.1)

git hash: c345549c

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 8 files) and - 1 authors - 1 vignette - no internal data file - 14 imported packages - 6 exported functions (median 16 lines of code) - 34 non-exported functions in R (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 8| 50.7| | |files_vignettes | 3| 92.4| | |files_tests | 5| 81.7| | |loc_R | 779| 61.1| | |loc_vignettes | 160| 41.2| | |loc_tests | 184| 53.0| | |num_vignettes | 1| 64.8| | |n_fns_r | 40| 49.3| | |n_fns_r_exported | 6| 29.1| | |n_fns_r_not_exported | 34| 56.6| | |n_fns_per_file_r | 3| 45.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 14| 45.4| | |loc_per_fn_r_exp | 16| 38.0| | |loc_per_fn_r_not_exp | 12| 42.0| | |rel_whitespace_R | 29| 73.7| | |rel_whitespace_vignettes | 65| 65.5| | |rel_whitespace_tests | 56| 72.7| | |doclines_per_fn_exp | 20| 13.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 37| 59.9| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

--- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.78 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- yf_get | 23 yf_get_single_ticker | 22 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 2 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 2


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.96 | |pkgcheck |0.0.2.276 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

mpadge commented 2 years ago

@jooolia The faling check is just because the README does not have a CI badge. @msperlin Could you please add an R CMD check badge to your readme? (We check for CI via badges rather than workflow results, because we do accept submissions from arbitrary code-hosting platforms, not just GitHub.) Thanks!

msperlin commented 2 years ago

Good morning.

Sure, I just added the R-CMD badge.

jooolia commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

ropensci-review-bot commented 2 years ago

Checks for yfR (v0.0.1)

git hash: 1ee2f6f5

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:---------|------:| |internal |base | 69| |internal |yfR | 17| |internal |utils | 3| |imports |dplyr | 11| |imports |purrr | 5| |imports |readr | 5| |imports |stringr | 4| |imports |rvest | 3| |imports |tidyr | 2| |imports |lubridate | 2| |imports |furrr | 2| |imports |future | 2| |imports |tibble | 1| |imports |zoo | 1| |imports |quantmod | 1| |imports |curl | NA| |imports |cli | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |ggplot2 | NA| |suggests |covr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

c (6), file.path (5), as.Date (4), min (4), paste0 (4), data.frame (3), file.exists (3), length (3), list (3), seq (3), as.character (2), as.numeric (2), for (2), max (2), names (2), options (2), rep (2), switch (2), tempdir (2), as.POSIXct (1), class (1), file (1), is.na (1), lapply (1), list.files (1), order (1), seq_along (1), setdiff (1), sum (1), Sys.Date (1), Sys.getenv (1), which (1)

yfR

fix_ticker_name (2), get_morale_boost (2), set_cli_msg (2), yf_get_available_indices (2), calc_ret (1), date_to_unix (1), fct_format_wide (1), unix_to_date (1), yf_get (1), yf_get_available_collections (1), yf_get_ibov_stocks (1), yf_get_index_comp (1), yf_get_single_ticker (1)

dplyr

first (3), bind_rows (2), tibble (2), filter (1), lag (1), mutate (1), rename (1)

purrr

map (2), map_chr (2), pmap (1)

readr

read_rds (4), write_rds (1)

stringr

fixed (1), str_c (1), str_detect (1), str_split (1)

rvest

html_nodes (2), html_table (1)

utils

data (2), capture.output (1)

furrr

furrr_options (2)

future

availableCores (1), plan (1)

lubridate

wday (2)

tidyr

all_of (1), pivot_wider (1)

quantmod

getSymbols (1)

tibble

tibble (1)

zoo

index (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 8 files) and - 1 authors - 1 vignette - no internal data file - 14 imported packages - 6 exported functions (median 16 lines of code) - 34 non-exported functions in R (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 8| 50.7| | |files_vignettes | 3| 92.4| | |files_tests | 5| 81.7| | |loc_R | 779| 61.1| | |loc_vignettes | 160| 41.2| | |loc_tests | 184| 53.0| | |num_vignettes | 1| 64.8| | |n_fns_r | 40| 49.3| | |n_fns_r_exported | 6| 29.1| | |n_fns_r_not_exported | 34| 56.6| | |n_fns_per_file_r | 3| 45.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 14| 45.4| | |loc_per_fn_r_exp | 16| 38.0| | |loc_per_fn_r_not_exp | 12| 42.0| | |rel_whitespace_R | 29| 73.7| | |rel_whitespace_vignettes | 65| 65.5| | |rel_whitespace_tests | 56| 72.7| | |doclines_per_fn_exp | 20| 13.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 37| 59.9| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/msperlin/yfR/workflows/R-CMD-check/badge.svg)](https://github.com/msperlin/yfR/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |1ee2f6 |2022-03-31 | |pkgdown |success |51af0f |2022-03-30 | |R-CMD-check |success |1ee2f6 |2022-03-31 | |render-rmarkdown |failure |f3dbe5 |2022-03-30 | |test-coverage |success |1ee2f6 |2022-03-31 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.78 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- yf_get | 23 yf_get_single_ticker | 22 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 2 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 2


Package Versions

|package |version | |:--------|:-------| |pkgstats |0.0.4.4 | |pkgcheck |0.0.3.6 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jooolia commented 2 years ago

Dear @msperlin, Thank you for your submission. The package has passed all of the automated package checks and the test coverage is good. Could you expand a bit more on how this package differs from quantmod and tidyquant? Thanks, Julia

msperlin commented 2 years ago

Good morning Julia,

The main goal of yfR is to help user download large ammounts of data from Yahoo Finance (YF).

Packages quantmod and tidyquant also offers a function for downloading price data from YF, but only that. Besides importing data, yfR offers the following functionalities:

jooolia commented 2 years ago

Thank you @msperlin, I am discussing with the other editors and will get back to you. Thanks, Julia

jooolia commented 2 years ago

Thanks for your patience @msperlin. The fit seems to be good for us and I am now looking for a handling editor. Thanks, Julia

msperlin commented 2 years ago

Great, thanks @jooolia.

jooolia commented 2 years ago

@ropensci-review-bot assign @melvidoni as editor

ropensci-review-bot commented 2 years ago

Assigned! @melvidoni is now the editor

melvidoni commented 2 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 2 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/523_status.svg)](https://github.com/ropensci/software-review/issues/523)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

msperlin commented 2 years ago

Thanks. The badge is added in dc712f4abac246604721ed7f2926f9794e4e7f99 and the news file already exists.

Athene-ai commented 2 years ago

Hi @melvidoni ! I would like to review this package

melvidoni commented 2 years ago

Hi @melvidoni ! I would like to review this package

Hello @Athene-ai, of course, this package is still needing reviewers. I saw you wrote on several packages, so be mindful that asking in multiple places may not be ideal, as you may end up with more workload than intended. The review timeframe for this is 3 weeks, so if that's okay with you, I'll assign you to this package (and you'll have to complete this review first before accepting any others).

Athene-ai commented 2 years ago

@melvidoni I accept the invitation to review this package within three weeks

melvidoni commented 2 years ago

@ropensci-review-bot assign @Athene-ai as reviewer

ropensci-review-bot commented 2 years ago

@Athene-ai added to the reviewers list. Review due date is 2022-05-26. Thanks @Athene-ai for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@Athene-ai: If you haven't done so, please fill this form for us to update our reviewers records.

Athene-ai commented 2 years ago

@melvidoni thanks for adding me as reviewer and I filled the volunteer form for being an rOpenSci Reviewer :-)

Athene-ai commented 2 years ago

@melvidoni do we have a slack channel?

melvidoni commented 2 years ago

@ropensci-review-bot assign @s3alfisc as reviewer

ropensci-review-bot commented 2 years ago

@s3alfisc added to the reviewers list. Review due date is 2022-05-29. Thanks @s3alfisc for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@s3alfisc: If you haven't done so, please fill this form for us to update our reviewers records.

melvidoni commented 2 years ago

@melvidoni do we have a slack channel?

Hello @Athene-ai. Please, be mindful that responses are not immediate, especially over the weekend; kindly do not hasten people, and wait for responses/actions. There is much going on "behind the scenes" that you may not be aware of.

That said, you'll get an invitation to the Slack later in the process.

Athene-ai commented 2 years ago

@melvidoni do we have a slack channel?

Hello @Athene-ai. Please, be mindful that responses are not immediate, especially over the weekend; kindly do not hasten people, and wait for responses/actions. There is much going on "behind the scenes" that you may not be aware of.

That said, you'll get an invitation to the Slack later in the process.

Thanks for the information 😊

mpadge commented 2 years ago

@Athene-ai Could you please paste a completed review here? Rather than adding more comments to this issue, you may leave that template there for now, and update it with an actual review when you've got that far. It's best to complete the template offline, edit the issue to delete all current content, and then simply paste the completed review back in place of the above comment. Thanks.

melvidoni commented 2 years ago

@ropensci-review-bot remove @Athene-ai from reviewers

ropensci-review-bot commented 2 years ago

@Athene-ai removed from the reviewers list!

melvidoni commented 2 years ago

@msperlin we apologise for the issues caused with the prior reviewer. It has now been removed from the list of reviewers, and I will proceed to search for another reviewer. Please understand that although we try to give everyone an opportunity, sometimes it is not possible to foresee how will they take the opportunity.

I will strive to get a new reviewer, but the person will be given 3 weeks from the acceptance date, hence some delays are bound to happen.

Edit: wrong punctuation, apologies.

msperlin commented 2 years ago

Good morning @melvidoni.

No problem at all. I can wait.

Best,

melvidoni commented 2 years ago

@ropensci-review-bot assign @thisisnic as reviewer

ropensci-review-bot commented 2 years ago

@thisisnic added to the reviewers list. Review due date is 2022-06-13. Thanks @thisisnic for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@thisisnic: If you haven't done so, please fill this form for us to update our reviewers records.

s3alfisc commented 2 years ago
title: “review”
output:
rmarkdown::md_document:
pandoc_args: [
“–wrap=none”
]

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

You can find more comments on documentation below.

Functionality

Estimated hours spent reviewing: 8


Additional Comments

I think that yfR is a very promising package with useful features, and I believe that it will be widely used. I very much enjoyed using it! To improve the package, I mostly suggest to invest more time into refining the documentation.

Documentation

Installation, Local CMD Check & pkgcheck

Testing

Functionality

Additional Functionality

Misc

  # check for NA
  if (any(is.na(tickers))) {
    my_msg <- paste0(
      "Found NA value in ticker vector.",
      "You need to remove it before running BatchGetSymbols."
    )
    stop(my_msg)
  }

    if (class(first_date) != "Date") {
    stop("ERROR: cant change class of first_date to 'Date'")
  }

In general, I really like the dreamerr package for function input type checks. checkmate seems to be very popular, too.

With dreamerr, you could e.g. write

  # check threshold
  if ((thresh_bad_data < 0) | (thresh_bad_data > 1)) {
    stop("Input thresh_bad_data should be a proportion between 0 and 1")
  }

as

dreamerr::check_arg(thresh_bad_data, "scalar numeric GT{0} LT{1}")

I can’t really follow this error message:

  if (!flag) {
    warning(stringr::str_glue(
      "\nIt seems you are using a non-default cache folder at {cache_folder}. ",
      "Be aware that if any stock event -- split or dividend -- happens ",
      "in between cache files, the resulting aggregate cache data will not ",
      "correspond to reality as some part of the price data will not be ",
      "adjusted to the event. For safety and reproducibility, my suggestion ",
      "is to use cache system only for the current session with tempdir(), ",
      "which is the default option."
    ))
  }
msperlin commented 2 years ago

Thanks @s3alfisc for the review! Appreciate it. Good ideas there.

I'll reply to all your comments in the next couple of days.

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/523#issuecomment-1140410709 time 8

ropensci-review-bot commented 2 years ago

Logged review for s3alfisc (hours: 8)

msperlin commented 2 years ago

Dear @s3alfisc , please find my replies below:


I think that yfR is a very promising package with useful features, and I believe that it will be widely used. I very much enjoyed using it! To improve the package, I mostly suggest to invest more time into refining the documentation.

Thanks, appreciate the feedback and the detailed review. Given your feedback and ideas, I've made many changes in the code and documentation.

Documentation

Statement of need: I would like to see a more refined statement of need at the beginning of the readme: what is yfR’s main innovation? E.g. start with something like “yfR is an API to yahoo finance. It speeds up the data downloading process by parallel computing and local caching.” Then explain what type of data yahoo finance includes.

Also thanks. I changed the readme.rmd file so that the reader can quickly grasp how to use the package.

I would move the discussion of data quality / limitations of yahoo finance and comparison to BatchGetSymbols to separate articles - I don’t think they are required in the readme. If you want to keep the reference to quantmod, maybe include a dedicated ‘Acknowledgements’ section at the end of the readme? Occasionally, you use jargon: e.g., not all users might now what a ticker is. I would move all examples from the readme to the ‘get started’ vignette. Alternatively, I would keep only one example in the readme.

I reorganized the topics in the readme.rmd and moved some as vignettes.

In the ‘get started’ vignette, I would hide the message output generated e.g. by yf_get() and explain in words what the function does: e.g. it checks the cache, downloads data if the cache is empty, else finishes etc.

I rather keep the yfR messages in the vignettes as they mimic the actual call to the function. I also improved the text in the main vignette ("get started").

The vignette states that multiple ‘collections’ are organized in the package. It would be great to include a full list of collections to the docs, e.g. as a separate article? The yf_get_available_collections() helps here, but what do the individual collections stand for? E.g. does IBOV stand for the Bovespa-Index?

Great idea. I added argument print_description for yf_get_available_collections() for printing a text description of available collections:

image

I would like to see some documentation on how the caching works: e.g., where are files saved? For how long are they saved? Is the cache ever cleaned, e.g. are cached files lost by re-starting the R session?

I added a section at the help file of yf_get(), explaining how the cache system works.

image

In the docs for yf_convert_to_wide, it would be good to print the initial long dataframe.

Done.

The documentation of yf_get() does not really, as a stand-alone, explain what the function does: download ticker data from yahoo finance, caching, parallelism etc. I would delete the reference to getSymbols. Note that as yf_get_default_cache_folder() is not exported, users will run into an error when trying yfr::yf_get_default_cache_folder().

Documentation was improved.

Also, mention that the ticker function argument is vectorized

Done.

You could improve the documentation for parallelism: I myself have never used furrr, so your hint to furrr::plan() is not too helpful. How about a dedicated article with a small example that illustrates how to run get_plan() in parallel? Also, I only learned from browsing the code that by default, half of all available cores are used.

I think that going into parallelism and furrr::plan() would be off topic. However, I added a link to furrr https://furrr.futureverse.org/ in argument do_parallel, so that the user can learn more about it, if desired.

What is the difference between a collection and an index?

A collection is just a bunch of tickers put together. An index can be a collection, but not all collections are indices.

Consider adding documentation of the data returned via yf_get(). Not being a financial economist, I for example have no idea what the price_adjusted column stands for. Beyond, what is the unit of measurement of the price variables? I suppose it is US Dollars? Further, what is the relationship between daily data and monthly data? Also, potentially add a note that when markets are closed, no data row will be created.

Done. New documentation is available at readme.rmd and also in help for yf_get().

image

examples could be more 'verbose', i.e. add documentation also, examples could be more 'exhaustive' - they are quite minimal at the moment the example for yf_convert_to_wide currently calls internal data - could you not simply attach the data set or load it?

I revised all examples, specially for the main function. I've made a few changes, but they look alright to me. Users can always check the vignettes for more details.

Installation, Local CMD Check & pkgcheck

Installation and CMD check pass without problems. I tried to run pkgcheck, but failed to get it to run. I suggest to run the pkgcheck action on github actions, at least for the time of the review.

I also failed to use pkgcheck on linux ubuntu/mint. I can't install its dependencies, despinte spending some time trying hard.

Testing

Code Coverage is currently only at around 80% - I would love to see this up at 95%, if not 100 :)

I tried my best to cover as much as possible, reaching 82,99%. One big miss is in the parallel computing part which, in the current version is not active (I removed it due to YF limits in the api call). There is a fix in course, but it depends on quantmod being in CRAN. I'll add the parallel tests once it is fixed.

The rest is just input error checking which, to me, fells fine to be uncovered (covering them would just be a gimmick). So, I'll not reach 100%, but will be close.

Functionality

All examples work very nicely. Overall, it was a lot of fun using the package! In general, the console output is very helpful and very pretty!

Great, thanks!

I am not sure if I would have default function arguments for first_date() and last_date(). If you want to keep it, I would change it from 15 days to one month.

Done.

yf_convert_to_wide() is super helpful - great idea to directly include it in the package!

Thanks. I know some people use the data that way, even though I dont like it..

Could the API be more permissive, e.g. accept dates with format dd-mm-yyyy?

I feel that ISO format is fine. This is the standard in R and users should probably adapt to it.

When trying the “SP500” collection example, I ran into several ‘error in download’ errors. Still, the function finished eventually with ‘binding price data’. What exactly is going on here? Did the function eventually manage to fetch all tickers? If no, could there be a final message, e.g. ‘300/500 tickers successfully fetched. To fetch all others, do this …’.

Good idea. I implemented the message. The user will now be aware of the relative percentage of tickers in the output data, when comparing to the requested vector of tickers. Whenever that is lower than 50%, a message tells the user to wait for 15 minutes before running it again.

Good image

Bad image

I have seen that there is already a PR opened to alert users when they have reached the yahoo finance limit. This is would indeed be a great feature!

We are working on this issue, already with a viable solution that should become official soon. Nonetheless, the package works fine in a single session in all my tests.

Additional Functionality

It would be great to add further collections, e.g. NASDAQ, DAX, SP30, FAANG etc

Yes! Definitely. The idea is having something for everyone..

The equivalent Python package, yfinance offers a range of additional functionality, e.g. data on dividents, stock splits, and institutional investors. Do you plan to incorporate any of these into the package in the future?

No. My proposal is focusing on stock data importating and organization, only.

Currently, the cached files are saved in the rds file format via readr::read_rds(). There might be faster and/or more memory-friendly alternatives available. Have you considered adding a function argument that would allow users to store files e.g. in the parquet file format?

I believe that .rds files works fine for yfR (I never saw a performance issue). But I'll keep that in mind. Also, this is very easy to change in the future.

Have you considered to integrate an autoplot functions to plot stock prices. autoplot would e.g. generate plots similarly to those created in the readme / vignette.

No, but I'll also keep it in mind.

Would it be possible to give an estimate of consumed memory of all cached files prior to a download? I would also consider to export yf_get_default_cache_folder() so that users are aware of the function and can easily check where yfR creates the cache.

Probably, but I fell that file size is not really an issue. The cache files are really small.

Nonetheless, I added a "Diagnostics" text at the end of the execution of yf_get. It includes the current size of cache files (see previous figure with output "Diagnostics").

Also, function yf_get_default_cache_folder() is now exported and available to users.

Misc

Do you need to export the magrittr pipe when using it internally?

This was implemented so yfR is compatible with R >= 4.0.0 (personally I preffer the new pipe).

I was not aware that exporting it is unecessary (I simply used usethis::use_pipe() when creating the package). I also feel that no harm is done in allowing the user access to the pipe when loading yfR (I'm not aware of any conflicts).

I took a brief glance at the error messages, and most of them are clear and easy to understand. Maybe you could rephrase

Thanks, I fixed that.

In general, I really like the dreamerr package for function input type checks. checkmate seems to be very popular, too.

Thanks for the suggestion. I was not aware of this package. I'll have a look but, for the time being, I'll stay with the current code.

I can’t really follow this error message: "\nIt seems you are using a non-default cache folder at {cache_folder}. ",

I tried my best, but the explanation is more technical than what I can put in a message. What the user should know is that, for stocks, there is no garantee that cache files can be merged without problems. This happens because external events such as dividends, can alter the adjusted prices recursively. So, you can get a different adjusted price for the same ticker/day if the query is made in different days.

I changed the text so that the explanation is more clear.

The collections are created via hard coded (wikipedia) URLs. This is likely prone to errors - what if e.g. the URLs change? I understand the attractiveness of this ‘dynamic’ lookup, as e.g. the composition of stock indices might change over time. Maybe you could add a second look-up link (in case the main URL breaks), or you could add a ‘fallback’ data.frame containing the names of all firms included in an index at a fixed date to fall back to? See also this link on potential error handling of URLs via tryCatch.

The fallback dataframe is a great idea and I implemented it. I don't like the first one of a "backup" url as requires more webscrapping code, which can be very unstable and hard to maintain.

I also implemented argument force_fallback in yf_get_index_comp, which allows the user to read the offlines files directly.

My last comment (repeating something I mentioned above): the equivalent python package is called yfinance. Maybe a better / SEO optimized name for the package would be yfinanceR?

I really liked the name yfR. Its short and easy to remember. But thanks for the suggestion.

msperlin commented 2 years ago

All changes are in the main branch..

thisisnic commented 2 years ago

I am currently working on my review of this package, and hope to finish it in the next few days if nothing unexpected comes up! I had an issue when I was running the examples in the vignette though, and so to deliver partial feedback which might be useful in the meantime, I've opened this issue relating to it on the project repo: https://github.com/msperlin/yfR/issues/11

melvidoni commented 2 years ago

I am currently working on my review of this package, and hope to finish it in the next few days if nothing unexpected comes up! I had an issue when I was running the examples in the vignette though, and so to deliver partial feedback which might be useful in the meantime, I've opened this issue relating to it on the project repo: msperlin/yfR#11

Hello Nicola, that's great, thank you!