Closed msperlin closed 2 years ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
The following problem was found in your submission template:
:wave:
Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.
git hash: c345549c
Important: All failing checks above must be addressed prior to proceeding
Package License: MIT + file LICENSE
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 8 files) and - 1 authors - 1 vignette - no internal data file - 14 imported packages - 6 exported functions (median 16 lines of code) - 34 non-exported functions in R (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 8| 50.7| | |files_vignettes | 3| 92.4| | |files_tests | 5| 81.7| | |loc_R | 779| 61.1| | |loc_vignettes | 160| 41.2| | |loc_tests | 184| 53.0| | |num_vignettes | 1| 64.8| | |n_fns_r | 40| 49.3| | |n_fns_r_exported | 6| 29.1| | |n_fns_r_not_exported | 34| 56.6| | |n_fns_per_file_r | 3| 45.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 14| 45.4| | |loc_per_fn_r_exp | 16| 38.0| | |loc_per_fn_r_not_exp | 12| 42.0| | |rel_whitespace_R | 29| 73.7| | |rel_whitespace_vignettes | 65| 65.5| | |rel_whitespace_tests | 56| 72.7| | |doclines_per_fn_exp | 20| 13.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 37| 59.9| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks--- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.78 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- yf_get | 23 yf_get_single_ticker | 22 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 2 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 2
|package |version | |:--------|:---------| |pkgstats |0.0.3.96 | |pkgcheck |0.0.2.276 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
@jooolia The faling check is just because the README does not have a CI badge. @msperlin Could you please add an R CMD check badge to your readme? (We check for CI via badges rather than workflow results, because we do accept submissions from arbitrary code-hosting platforms, not just GitHub.) Thanks!
Good morning.
Sure, I just added the R-CMD badge.
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
Editor check started
:wave:
Oops, something went wrong with our automatic package checks. Our developers [have been notified]() and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.
git hash: 1ee2f6f5
Package License: MIT + file LICENSE
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:---------|------:|
|internal |base | 69|
|internal |yfR | 17|
|internal |utils | 3|
|imports |dplyr | 11|
|imports |purrr | 5|
|imports |readr | 5|
|imports |stringr | 4|
|imports |rvest | 3|
|imports |tidyr | 2|
|imports |lubridate | 2|
|imports |furrr | 2|
|imports |future | 2|
|imports |tibble | 1|
|imports |zoo | 1|
|imports |quantmod | 1|
|imports |curl | NA|
|imports |cli | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|suggests |ggplot2 | NA|
|suggests |covr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
c (6), file.path (5), as.Date (4), min (4), paste0 (4), data.frame (3), file.exists (3), length (3), list (3), seq (3), as.character (2), as.numeric (2), for (2), max (2), names (2), options (2), rep (2), switch (2), tempdir (2), as.POSIXct (1), class (1), file (1), is.na (1), lapply (1), list.files (1), order (1), seq_along (1), setdiff (1), sum (1), Sys.Date (1), Sys.getenv (1), which (1)
fix_ticker_name (2), get_morale_boost (2), set_cli_msg (2), yf_get_available_indices (2), calc_ret (1), date_to_unix (1), fct_format_wide (1), unix_to_date (1), yf_get (1), yf_get_available_collections (1), yf_get_ibov_stocks (1), yf_get_index_comp (1), yf_get_single_ticker (1)
first (3), bind_rows (2), tibble (2), filter (1), lag (1), mutate (1), rename (1)
map (2), map_chr (2), pmap (1)
read_rds (4), write_rds (1)
fixed (1), str_c (1), str_detect (1), str_split (1)
html_nodes (2), html_table (1)
data (2), capture.output (1)
furrr_options (2)
availableCores (1), plan (1)
wday (2)
all_of (1), pivot_wider (1)
getSymbols (1)
tibble (1)
index (1)
base
yfR
dplyr
purrr
readr
stringr
rvest
utils
furrr
future
lubridate
tidyr
quantmod
tibble
zoo
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 8 files) and - 1 authors - 1 vignette - no internal data file - 14 imported packages - 6 exported functions (median 16 lines of code) - 34 non-exported functions in R (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 8| 50.7| | |files_vignettes | 3| 92.4| | |files_tests | 5| 81.7| | |loc_R | 779| 61.1| | |loc_vignettes | 160| 41.2| | |loc_tests | 184| 53.0| | |num_vignettes | 1| 64.8| | |n_fns_r | 40| 49.3| | |n_fns_r_exported | 6| 29.1| | |n_fns_r_not_exported | 34| 56.6| | |n_fns_per_file_r | 3| 45.9| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 14| 45.4| | |loc_per_fn_r_exp | 16| 38.0| | |loc_per_fn_r_not_exp | 12| 42.0| | |rel_whitespace_R | 29| 73.7| | |rel_whitespace_vignettes | 65| 65.5| | |rel_whitespace_tests | 56| 72.7| | |doclines_per_fn_exp | 20| 13.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 37| 59.9| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/msperlin/yfR/workflows/R-CMD-check/badge.svg)](https://github.com/msperlin/yfR/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |1ee2f6 |2022-03-31 | |pkgdown |success |51af0f |2022-03-30 | |R-CMD-check |success |1ee2f6 |2022-03-31 | |render-rmarkdown |failure |f3dbe5 |2022-03-30 | |test-coverage |success |1ee2f6 |2022-03-31 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.78 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- yf_get | 23 yf_get_single_ticker | 22 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 2 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 2
|package |version | |:--------|:-------| |pkgstats |0.0.4.4 | |pkgcheck |0.0.3.6 |
This package is in top shape and may be passed on to a handling editor
Dear @msperlin, Thank you for your submission. The package has passed all of the automated package checks and the test coverage is good. Could you expand a bit more on how this package differs from quantmod and tidyquant? Thanks, Julia
Good morning Julia,
The main goal of yfR is to help user download large ammounts of data from Yahoo Finance (YF).
Packages quantmod and tidyquant also offers a function for downloading price data from YF, but only that. Besides importing data, yfR offers the following functionalities:
Organization and clean up of data
smarter downloads
Practicality
Thank you @msperlin, I am discussing with the other editors and will get back to you. Thanks, Julia
Thanks for your patience @msperlin. The fit seems to be good for us and I am now looking for a handling editor. Thanks, Julia
Great, thanks @jooolia.
@ropensci-review-bot assign @melvidoni as editor
Assigned! @melvidoni is now the editor
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/523_status.svg)](https://github.com/ropensci/software-review/issues/523)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
Thanks. The badge is added in dc712f4abac246604721ed7f2926f9794e4e7f99 and the news file already exists.
Hi @melvidoni ! I would like to review this package
Hi @melvidoni ! I would like to review this package
Hello @Athene-ai, of course, this package is still needing reviewers. I saw you wrote on several packages, so be mindful that asking in multiple places may not be ideal, as you may end up with more workload than intended. The review timeframe for this is 3 weeks, so if that's okay with you, I'll assign you to this package (and you'll have to complete this review first before accepting any others).
@melvidoni I accept the invitation to review this package within three weeks
@ropensci-review-bot assign @Athene-ai as reviewer
@Athene-ai added to the reviewers list. Review due date is 2022-05-26. Thanks @Athene-ai for accepting to review! Please refer to our reviewer guide.
@Athene-ai: If you haven't done so, please fill this form for us to update our reviewers records.
@melvidoni thanks for adding me as reviewer and I filled the volunteer form for being an rOpenSci Reviewer :-)
@melvidoni do we have a slack channel?
@ropensci-review-bot assign @s3alfisc as reviewer
@s3alfisc added to the reviewers list. Review due date is 2022-05-29. Thanks @s3alfisc for accepting to review! Please refer to our reviewer guide.
@s3alfisc: If you haven't done so, please fill this form for us to update our reviewers records.
@melvidoni do we have a slack channel?
Hello @Athene-ai. Please, be mindful that responses are not immediate, especially over the weekend; kindly do not hasten people, and wait for responses/actions. There is much going on "behind the scenes" that you may not be aware of.
That said, you'll get an invitation to the Slack later in the process.
@melvidoni do we have a slack channel?
Hello @Athene-ai. Please, be mindful that responses are not immediate, especially over the weekend; kindly do not hasten people, and wait for responses/actions. There is much going on "behind the scenes" that you may not be aware of.
That said, you'll get an invitation to the Slack later in the process.
Thanks for the information 😊
@Athene-ai Could you please paste a completed review here? Rather than adding more comments to this issue, you may leave that template there for now, and update it with an actual review when you've got that far. It's best to complete the template offline, edit the issue to delete all current content, and then simply paste the completed review back in place of the above comment. Thanks.
@ropensci-review-bot remove @Athene-ai from reviewers
@Athene-ai removed from the reviewers list!
@msperlin we apologise for the issues caused with the prior reviewer. It has now been removed from the list of reviewers, and I will proceed to search for another reviewer. Please understand that although we try to give everyone an opportunity, sometimes it is not possible to foresee how will they take the opportunity.
I will strive to get a new reviewer, but the person will be given 3 weeks from the acceptance date, hence some delays are bound to happen.
Edit: wrong punctuation, apologies.
Good morning @melvidoni.
No problem at all. I can wait.
Best,
@ropensci-review-bot assign @thisisnic as reviewer
@thisisnic added to the reviewers list. Review due date is 2022-06-13. Thanks @thisisnic for accepting to review! Please refer to our reviewer guide.
@thisisnic: If you haven't done so, please fill this form for us to update our reviewers records.
title: “review” |
output: |
rmarkdown::md_document: |
pandoc_args: [ |
“–wrap=none” |
] |
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
). All available.You can find more comments on documentation below.
yfinanceR
- the name of the equivalent Python package is yfinance
codemeta
error()
and warning()
remotes
, pak
, etc??yfR
does not return any documentation of the packageroxygen2
is used@return
statements specify the returned data object. But you could be more specific - usually, tibbles
are returned, not base `data.frames@noRD
used for non-exported functionspkgdown
website existspkgstats
, it looks like there are multiple package dependencies that you could easily replace by using base
functions, e.g. dplyr
, magrittr
, tibble
. What is the advantage of using readr::read_rds()
over base::readRDS()
? The packages curl
, cli
, is not detected in use by pkgstats
. Nevertheless, all imported packages are of high quality, so I have no concerns here. Estimated hours spent reviewing: 8
I think that yfR
is a very promising package with useful features, and I believe that it will be widely used. I very much enjoyed using it! To improve the package, I mostly suggest to invest more time into refining the documentation.
yfR
’s main innovation? E.g. start with something like “yfR
is an API to yahoo finance. It speeds up the data downloading process by parallel computing and local caching.” Then explain what type of data yahoo finance includes. BatchGetSymbols
to separate articles - I don’t think they are required in the readme.quantmod
, maybe include a dedicated ‘Acknowledgements’ section at the end of the readme?yf_get()
and explain in words what the function does: e.g. it checks the cache, downloads data if the cache is empty, else finishes etc.yf_get_available_collections()
helps here, but what do the individual collections stand for? E.g. does IBOV stand for the Bovespa-Index?yf_convert_to_wide
, it would be good to print the initial long dataframe.yf_get()
does not really, as a stand-alone, explain what the function does: download ticker data from yahoo finance, caching, parallelism etc. I would delete the reference to getSymbols
. Note that as yf_get_default_cache_folder()
is not exported, users will run into an error when trying yfr::yf_get_default_cache_folder()
. Also, mention that the ticker
function argument is vectorizedfurrr
, so your hint to furrr::plan()
is not too helpful. How about a dedicated article with a small example that illustrates how to run get_plan()
in parallel? Also, I only learned from browsing the code that by default, half of all available cores are used.yf_get()
. Not being a financial economist, I for example have no idea what the price_adjusted
column stands for. Beyond, what is the unit of measurement of the price variables? I suppose it is US Dollars? Further, what is the relationship between daily data and monthly data? Also, potentially add a note that when markets are closed, no data row will be created.yf_convert_to_wide
currently calls internal data - could you not simply attach the data set or load it? pkgcheck
, but failed to get it to run. I suggest to run the pkgcheck
action on github actions, at least for the time of the review. first_date()
and last_date()
. If you want to keep it, I would change it from 15 days to one month. yf_convert_to_wide()
is super helpful - great idea to directly include it in the package!dd-mm-yyyy
?rds
file format via readr::read_rds()
. There might be faster and/or more memory-friendly alternatives available. Have you considered adding a function argument that would allow users to store files e.g. in the parquet
file format? autoplot
functions to plot stock prices. autoplot
would e.g. generate plots similarly to those created in the readme / vignette.yf_get_default_cache_folder()
so that users are aware of the function and can easily check where yfR
creates the cache.magrittr
pipe when using it internally? # check for NA
if (any(is.na(tickers))) {
my_msg <- paste0(
"Found NA value in ticker vector.",
"You need to remove it before running BatchGetSymbols."
)
stop(my_msg)
}
if (class(first_date) != "Date") {
stop("ERROR: cant change class of first_date to 'Date'")
}
In general, I really like the dreamerr package for function input type checks. checkmate seems to be very popular, too.
With dreamerr
, you could e.g. write
# check threshold
if ((thresh_bad_data < 0) | (thresh_bad_data > 1)) {
stop("Input thresh_bad_data should be a proportion between 0 and 1")
}
as
dreamerr::check_arg(thresh_bad_data, "scalar numeric GT{0} LT{1}")
I can’t really follow this error message:
if (!flag) {
warning(stringr::str_glue(
"\nIt seems you are using a non-default cache folder at {cache_folder}. ",
"Be aware that if any stock event -- split or dividend -- happens ",
"in between cache files, the resulting aggregate cache data will not ",
"correspond to reality as some part of the price data will not be ",
"adjusted to the event. For safety and reproducibility, my suggestion ",
"is to use cache system only for the current session with tempdir(), ",
"which is the default option."
))
}
The collections are created via hard coded (wikipedia) URLs. This is likely prone to errors - what if e.g. the URLs change? I understand the attractiveness of this ‘dynamic’ lookup, as e.g. the composition of stock indices might change over time. Maybe you could add a second look-up link (in case the main URL breaks), or you could add a ‘fallback’ data.frame containing the names of all firms included in an index at a fixed date to fall back to? See also this link on potential error handling of URLs via tryCatch
.
My last comment (repeating something I mentioned above): the equivalent python package is called yfinance
. Maybe a better / SEO optimized name for the package would be yfinanceR
?
Thanks @s3alfisc for the review! Appreciate it. Good ideas there.
I'll reply to all your comments in the next couple of days.
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/523#issuecomment-1140410709 time 8
Logged review for s3alfisc (hours: 8)
Dear @s3alfisc , please find my replies below:
I think that yfR is a very promising package with useful features, and I believe that it will be widely used. I very much enjoyed using it! To improve the package, I mostly suggest to invest more time into refining the documentation.
Thanks, appreciate the feedback and the detailed review. Given your feedback and ideas, I've made many changes in the code and documentation.
Statement of need: I would like to see a more refined statement of need at the beginning of the readme: what is yfR’s main innovation? E.g. start with something like “yfR is an API to yahoo finance. It speeds up the data downloading process by parallel computing and local caching.” Then explain what type of data yahoo finance includes.
Also thanks. I changed the readme.rmd file so that the reader can quickly grasp how to use the package.
I would move the discussion of data quality / limitations of yahoo finance and comparison to BatchGetSymbols to separate articles - I don’t think they are required in the readme. If you want to keep the reference to quantmod, maybe include a dedicated ‘Acknowledgements’ section at the end of the readme? Occasionally, you use jargon: e.g., not all users might now what a ticker is. I would move all examples from the readme to the ‘get started’ vignette. Alternatively, I would keep only one example in the readme.
I reorganized the topics in the readme.rmd and moved some as vignettes.
In the ‘get started’ vignette, I would hide the message output generated e.g. by yf_get() and explain in words what the function does: e.g. it checks the cache, downloads data if the cache is empty, else finishes etc.
I rather keep the yfR
messages in the vignettes as they mimic the actual call to the function. I also improved the text in the main vignette ("get started").
The vignette states that multiple ‘collections’ are organized in the package. It would be great to include a full list of collections to the docs, e.g. as a separate article? The yf_get_available_collections() helps here, but what do the individual collections stand for? E.g. does IBOV stand for the Bovespa-Index?
Great idea. I added argument print_description
for yf_get_available_collections()
for printing a text description of available collections:
I would like to see some documentation on how the caching works: e.g., where are files saved? For how long are they saved? Is the cache ever cleaned, e.g. are cached files lost by re-starting the R session?
I added a section at the help file of yf_get()
, explaining how the cache system works.
In the docs for yf_convert_to_wide, it would be good to print the initial long dataframe.
Done.
The documentation of yf_get() does not really, as a stand-alone, explain what the function does: download ticker data from yahoo finance, caching, parallelism etc. I would delete the reference to getSymbols. Note that as yf_get_default_cache_folder() is not exported, users will run into an error when trying yfr::yf_get_default_cache_folder().
Documentation was improved.
Also, mention that the ticker function argument is vectorized
Done.
You could improve the documentation for parallelism: I myself have never used furrr, so your hint to furrr::plan() is not too helpful. How about a dedicated article with a small example that illustrates how to run get_plan() in parallel? Also, I only learned from browsing the code that by default, half of all available cores are used.
I think that going into parallelism and furrr::plan() would be off topic. However, I added a link to furrr https://furrr.futureverse.org/ in argument do_parallel, so that the user can learn more about it, if desired.
What is the difference between a collection and an index?
A collection is just a bunch of tickers put together. An index can be a collection, but not all collections are indices.
Consider adding documentation of the data returned via yf_get(). Not being a financial economist, I for example have no idea what the price_adjusted column stands for. Beyond, what is the unit of measurement of the price variables? I suppose it is US Dollars? Further, what is the relationship between daily data and monthly data? Also, potentially add a note that when markets are closed, no data row will be created.
Done. New documentation is available at readme.rmd and also in help for yf_get().
examples could be more 'verbose', i.e. add documentation also, examples could be more 'exhaustive' - they are quite minimal at the moment the example for yf_convert_to_wide currently calls internal data - could you not simply attach the data set or load it?
I revised all examples, specially for the main function. I've made a few changes, but they look alright to me. Users can always check the vignettes for more details.
Installation and CMD check pass without problems. I tried to run pkgcheck, but failed to get it to run. I suggest to run the pkgcheck action on github actions, at least for the time of the review.
I also failed to use pkgcheck on linux ubuntu/mint. I can't install its dependencies, despinte spending some time trying hard.
Code Coverage is currently only at around 80% - I would love to see this up at 95%, if not 100 :)
I tried my best to cover as much as possible, reaching 82,99%. One big miss is in the parallel computing part which, in the current version is not active (I removed it due to YF limits in the api call). There is a fix in course, but it depends on quantmod being in CRAN. I'll add the parallel tests once it is fixed.
The rest is just input error checking which, to me, fells fine to be uncovered (covering them would just be a gimmick). So, I'll not reach 100%, but will be close.
All examples work very nicely. Overall, it was a lot of fun using the package! In general, the console output is very helpful and very pretty!
Great, thanks!
I am not sure if I would have default function arguments for first_date() and last_date(). If you want to keep it, I would change it from 15 days to one month.
Done.
yf_convert_to_wide() is super helpful - great idea to directly include it in the package!
Thanks. I know some people use the data that way, even though I dont like it..
Could the API be more permissive, e.g. accept dates with format dd-mm-yyyy?
I feel that ISO format is fine. This is the standard in R and users should probably adapt to it.
When trying the “SP500” collection example, I ran into several ‘error in download’ errors. Still, the function finished eventually with ‘binding price data’. What exactly is going on here? Did the function eventually manage to fetch all tickers? If no, could there be a final message, e.g. ‘300/500 tickers successfully fetched. To fetch all others, do this …’.
Good idea. I implemented the message. The user will now be aware of the relative percentage of tickers in the output data, when comparing to the requested vector of tickers. Whenever that is lower than 50%, a message tells the user to wait for 15 minutes before running it again.
Good
Bad
I have seen that there is already a PR opened to alert users when they have reached the yahoo finance limit. This is would indeed be a great feature!
We are working on this issue, already with a viable solution that should become official soon. Nonetheless, the package works fine in a single session in all my tests.
It would be great to add further collections, e.g. NASDAQ, DAX, SP30, FAANG etc
Yes! Definitely. The idea is having something for everyone..
The equivalent Python package, yfinance offers a range of additional functionality, e.g. data on dividents, stock splits, and institutional investors. Do you plan to incorporate any of these into the package in the future?
No. My proposal is focusing on stock data importating and organization, only.
Currently, the cached files are saved in the rds file format via readr::read_rds(). There might be faster and/or more memory-friendly alternatives available. Have you considered adding a function argument that would allow users to store files e.g. in the parquet file format?
I believe that .rds files works fine for yfR (I never saw a performance issue). But I'll keep that in mind. Also, this is very easy to change in the future.
Have you considered to integrate an autoplot functions to plot stock prices. autoplot would e.g. generate plots similarly to those created in the readme / vignette.
No, but I'll also keep it in mind.
Would it be possible to give an estimate of consumed memory of all cached files prior to a download? I would also consider to export yf_get_default_cache_folder() so that users are aware of the function and can easily check where yfR creates the cache.
Probably, but I fell that file size is not really an issue. The cache files are really small.
Nonetheless, I added a "Diagnostics" text at the end of the execution of yf_get. It includes the current size of cache files (see previous figure with output "Diagnostics").
Also, function yf_get_default_cache_folder() is now exported and available to users.
Do you need to export the magrittr pipe when using it internally?
This was implemented so yfR is compatible with R >= 4.0.0 (personally I preffer the new pipe).
I was not aware that exporting it is unecessary (I simply used usethis::use_pipe() when creating the package). I also feel that no harm is done in allowing the user access to the pipe when loading yfR (I'm not aware of any conflicts).
I took a brief glance at the error messages, and most of them are clear and easy to understand. Maybe you could rephrase
Thanks, I fixed that.
In general, I really like the dreamerr package for function input type checks. checkmate seems to be very popular, too.
Thanks for the suggestion. I was not aware of this package. I'll have a look but, for the time being, I'll stay with the current code.
I can’t really follow this error message: "\nIt seems you are using a non-default cache folder at {cache_folder}. ",
I tried my best, but the explanation is more technical than what I can put in a message. What the user should know is that, for stocks, there is no garantee that cache files can be merged without problems. This happens because external events such as dividends, can alter the adjusted prices recursively. So, you can get a different adjusted price for the same ticker/day if the query is made in different days.
I changed the text so that the explanation is more clear.
The collections are created via hard coded (wikipedia) URLs. This is likely prone to errors - what if e.g. the URLs change? I understand the attractiveness of this ‘dynamic’ lookup, as e.g. the composition of stock indices might change over time. Maybe you could add a second look-up link (in case the main URL breaks), or you could add a ‘fallback’ data.frame containing the names of all firms included in an index at a fixed date to fall back to? See also this link on potential error handling of URLs via tryCatch.
The fallback dataframe is a great idea and I implemented it. I don't like the first one of a "backup" url as requires more webscrapping code, which can be very unstable and hard to maintain.
I also implemented argument force_fallback in yf_get_index_comp
, which allows the user to read the offlines files directly.
My last comment (repeating something I mentioned above): the equivalent python package is called yfinance. Maybe a better / SEO optimized name for the package would be yfinanceR?
I really liked the name yfR. Its short and easy to remember. But thanks for the suggestion.
All changes are in the main branch..
I am currently working on my review of this package, and hope to finish it in the next few days if nothing unexpected comes up! I had an issue when I was running the examples in the vignette though, and so to deliver partial feedback which might be useful in the meantime, I've opened this issue relating to it on the project repo: https://github.com/msperlin/yfR/issues/11
I am currently working on my review of this package, and hope to finish it in the next few days if nothing unexpected comes up! I had an issue when I was running the examples in the vignette though, and so to deliver partial feedback which might be useful in the meantime, I've opened this issue relating to it on the project repo: msperlin/yfR#11
Hello Nicola, that's great, thank you!
Date accepted: 2022-06-21
Submitting Author Name: Marcelo Perlin Submitting Author Github Handle: !--author1-->@msperlin<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/msperlin/yfR Version submitted: 0.0.1 Submission type: Standard Editor: !--editor-->@melvidoni<!--end-editor-- Reviewers: @s3alfisc, @thisisnic
Due date for @s3alfisc: 2022-05-29 Due date for @thisisnic: 2022-06-13Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Package yfR retrieves and organizes data from Yahoo Finance, a large repository for stock price data.
Target audience are students, researchers and industry practioneers in the field of Finance and Economics.
Package yfR is the second and backwards-incompatible version of BatchGetSymbols, also developed by me. My plan is to first deprecate BatchGetSymbols and later remove it from CRAN and archive it in Github.
Moreover, there are other packages, such as quantmod, that downloads data from Yahoo Finance, but none with similar features to yfR and BatchGetSymbols.
Yes.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Unfortinately, I was not able to run pkgcheck locally as I was unable to install (or make) dependency ctags in my Linux Mint 20.3 machine. Nonetheless, I read through and followed all guidelines available in the manual.
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct