ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

yfR: Downloads and Organizes Financial Data from Yahoo Finance #523

Closed msperlin closed 2 years ago

msperlin commented 2 years ago

Date accepted: 2022-06-21

Submitting Author Name: Marcelo Perlin Submitting Author Github Handle: !--author1-->@msperlin<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/msperlin/yfR Version submitted: 0.0.1 Submission type: Standard Editor: !--editor-->@melvidoni<!--end-editor-- Reviewers: @s3alfisc, @thisisnic

Due date for @s3alfisc: 2022-05-29 Due date for @thisisnic: 2022-06-13

Archive: TBD Version accepted: TBD Language: en

Package: yfR
Title: Downloads and Organizes Financial Data from Yahoo Finance
Version: 0.0.1
Authors@R: person("Marcelo", "Perlin", email = "marceloperlin@gmail.com", role = c("aut", "cre"))
Description: Facilitates download of financial data from Yahoo Finance <https://finance.yahoo.com/>, 
 a vast repository of stock price data across multiple financial exchanges. The package offers a local caching system
 and support for parallel computation.
URL: https://github.com/msperlin/yfR
BugReports: https://github.com/msperlin/yfR/issues
Depends:
    R (>= 4.1)
Imports: stringr, curl, tidyr, 
    lubridate, furrr, purrr, future, tibble, zoo,
    cli, readr, rvest, dplyr, quantmod
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 7.1.2
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    ggplot2,
    covr
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

Package yfR retrieves and organizes data from Yahoo Finance, a large repository for stock price data.

Target audience are students, researchers and industry practioneers in the field of Finance and Economics.

Package yfR is the second and backwards-incompatible version of BatchGetSymbols, also developed by me. My plan is to first deprecate BatchGetSymbols and later remove it from CRAN and archive it in Github.

Moreover, there are other packages, such as quantmod, that downloads data from Yahoo Finance, but none with similar features to yfR and BatchGetSymbols.

Yes.

Unfortinately, I was not able to run pkgcheck locally as I was unable to install (or make) dependency ctags in my Linux Mint 20.3 machine. Nonetheless, I read through and followed all guidelines available in the manual.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

msperlin commented 2 years ago

Thanks @thisisnic,

No rush for the review. I'm also on holidays and might take a while to reply here.

As for the issue, I think you might have a problem with quantmod. Please update with latest version in CRAN and try again.

thisisnic commented 2 years ago

Thanks @msperlin - no problem! Great, that fixes it for me! Would it be worth perhaps adding the version to the DESCRIPTION file as described here[1] in order to prevent other people with older package versions running into the same error, and wondering why it's not working?

[1] https://r-pkgs.org/description.html#minimum-versions

thisisnic commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 4


Review Comments

Overall, I think this is an excellent package and I especially appreciate that it both retrieves the data and puts it into tidy format - as someone who's taught workshops in the past, this kind of thing is really helpful!

As @s3alfisc did such a good job of covering areas around design and possible extensions, I thought I'd instead focus on code review, and areas that I might tweak to ensure the code is as maintainable as possible. A lot of the comments below are stylistic, or just questions, and so feel free to take or leave as many of them as you wish.

This is my first review, so I hope this is as expected, and if there are any areas which I've missed or focussed on too much, I'm happy to receive feedback on that.

Comments on yf_get.R

This is well documented. There are places where I might choose to shorten parameter names to make them more easily memorable, but this is a personal preference rather than a recommendation. The code is skimmable, the comments help guide me through what's happening, variable names are intuitive, and error messages are informative.

A few typos: "cant" instead of "can't".

Some error messages have the text "ERROR" in them - I'd recommend removing these, as this is duplicating the in-built error text, e.g. it outputs as Error: ERROR: cant change class of last_date to 'Date'

I see some code which edits a dplyr option: options(dplyr.summarise.inform = FALSE) and later resets it: options(dplyr.summarise.inform = TRUE). Could it perhaps be useful to the user if you were to check if it is already set or set to a different value, and use a call to on.exit() to reset it to its previous value later on, to prevent changing their chosen setting without them knowing? [Edited to add: you can also do this via withr::local_options()]

There is some commented out code in yf_get.R (lines 270 - 273) - can this be removed?

This function does a lot of things, and I might consider abstracting out some of the logic into smaller helper functions in future to make it easier to test in a more modular way. However, I don't think this is a problem here, as the code is well commented and flows in a logical order.

Comments on collections.R

Again, well commented and well documented, which is great.

It looks like yf_get_available_collections() just calls yf_get_available_indices() and returns the results - could you instead define it more simply, e.g. yf_get_available_collections() <- yf_get_available_indices()? And, what's the reason for having both? (This function changed slightly between when I first reviewed this and later came back to it, but I think the question is still relevant)

yf_collection_get() - it says it returns a "dataframe" - perhaps this could be either "data frame" (the concept) or "data.frame" (the object type) for more clarity?

Quick question - any reason this file doesn't have the yf prefix, but the other ones do? Not important, just curious.

Another tiny comment - I wonder if yf_get_collection() might feel like a better fit with the other function names than yf_collection_get()?

Comments on yf_convert_to_wide.R

Chunk of whitespace within the definition of fct_format_wide() which could be removed.

Comments on yf_get_clean_data.R

Some commented out code here - can this be removed?

Comments on yf_get_single_ticker.R

Love the use of the cli package to add messages to really help the user see what's going on.

Comments on "getting started" vignette

In the plot comparing the daily/weekly/monthly/yearly data, I might consider removing geom_point() and just using geom_line() as during a naive look at the plot, I wondered what the thickness of the lines meant before realising that the points had blurred together. I might also use breaks on the x axis with a higher frequency than five years.

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/523#issuecomment-1147960056 time 4

ropensci-review-bot commented 2 years ago

Logged review for thisisnic (hours: 4)

msperlin commented 2 years ago

Dear @thisisnic, please find my replies below.

Again, thanks for the time reviewing my package. I've made many changes based on your comments. All changes are in the main branch.

Review Comments

Overall, I think this is an excellent package and I especially appreciate that it both retrieves the data and puts it into tidy format - as someone who's taught workshops in the past, this kind of thing is really helpful!

Thanks! I'm glad you enjoyed using it.

As @s3alfisc did such a good job of covering areas around design and possible extensions, I thought I'd instead focus on code review, and areas that I might tweak to ensure the code is as maintainable as possible. A lot of the comments below are stylistic, or just questions, and so feel free to take or leave as many of them as you wish.

The only point which I think definitely does need addressing is mentioned in the comment above which references an issue opened on the repo - adding a minimum version of quantmod to the DESCRIPTION file, as otherwise the examples shown in the vignette fail to run correctly.

Yes, this is fixed in the latest commit 2749f6fbdad562fe6f963e409470916d58f6fdb2.

This is my first review, so I hope this is as expected, and if there are any areas which I've missed or focussed on too much, I'm happy to receive feedback on that.

Comments on yf_get.R

This is well documented. There are places where I might choose to shorten parameter names to make them more easily memorable, but this is a personal preference rather than a recommendation. The code is skimmable, the comments help guide me through what's happening, variable names are intuitive, and error messages are informative.

Thanks.

A few typos: "cant" instead of "can't".

Fixed.

Some error messages have the text "ERROR" in them - I'd recommend removing these, as this is duplicating the in-built error text, e.g. it outputs as Error: ERROR: cant change class of last_date to 'Date'

Fixed.

I see some code which edits a dplyr option: options(dplyr.summarise.inform = FALSE) and later resets it: options(dplyr.summarise.inform = TRUE). Could it perhaps be useful to the user if you were to check if it is already set or set to a different value, and use a call to on.exit() to reset it to its previous value later on, to prevent changing their chosen setting without them knowing? [Edited to add: you can also do this via withr::local_options()]

You're right. I changed the code so that the user choice is always respected.

There is some commented out code in yf_get.R (lines 270 - 273) - can this be removed?

Done.

This function does a lot of things, and I might consider abstracting out some of the logic into smaller helper functions in future to make it easier to test in a more modular way. However, I don't think this is a problem here, as the code is well commented and flows in a logical order.

I agree. It is becoming a long code. I'll keep that in mind.

Comments on collections.R

Again, well commented and well documented, which is great.

Thanks!

It looks like yf_get_available_collections() just calls yf_get_available_indices() and returns the results - could you instead define it more simply, e.g. yf_get_available_collections() <- yf_get_available_indices()? And, what's the reason for having both? (This function changed slightly between when I first reviewed this and later came back to it, but I think the question is still relevant)

The idea is for later to have collections that are not indices. This is way I broke it into separate functions.

yf_collection_get() - it says it returns a "dataframe" - perhaps this could be either "data frame" (the concept) or "data.frame" (the object type) for more clarity?

Done.

Quick question - any reason this file doesn't have the yf prefix, but the other ones do? Not important, just curious.

My mistake. Its fixed.

Another tiny comment - I wonder if yf_get_collection() might feel like a better fit with the other function names than yf_collection_get()?

I followed the convention of using "object""verb" when naming functions. In fact, I just realized I used the wrong convention in some functions. For consistency, I changed all names to the "object""verb" pattern except yf_get(), which I feel is short, concise and intuitive.

Comments on yf_convert_to_wide.R

Chunk of whitespace within the definition of fct_format_wide() which could be removed.

Done.

Comments on yf_get_clean_data.R

Some commented out code here - can this be removed?

Done.

Comments on yf_get_single_ticker.R

Love the use of the cli package to add messages to really help the user see what's going on.

Thanks. I really like it too.

Comments on "getting started" vignette

In the plot comparing the daily/weekly/monthly/yearly data, I might consider removing geom_point() and just using geom_line() as during a naive look at the plot, I wondered what the thickness of the lines meant before realising that the points had blurred together. I might also use breaks on the x axis with a higher frequency than five years.

Agreed. To fix it, I increased the time time period and removed the geom_point() layer.

melvidoni commented 2 years ago

@thisisnic and @s3alfisc what are your thoughts on the changes? Do let me know if you are satisfied with these changes, or if there is anything else outstanding.

s3alfisc commented 2 years ago

Hi @melvidoni, I will try to take a closer look at @msperlin's comments this weekend / early next week =)

thisisnic commented 2 years ago

Thanks for the reminder there @melvidoni - I am happy with the changes made.

melvidoni commented 2 years ago

Thanks @thisisnic! We will wait until after the weekend for @s3alfisc's comments then!

s3alfisc commented 2 years ago

Hi all, I am very happy with the changes. The package looks and works great! In particular, I am super happy about how the documentation has evolved! :)

melvidoni commented 2 years ago

@ropensci-review-bot approve yfR

ropensci-review-bot commented 2 years ago

Approved! Thanks @msperlin for submitting and @s3alfisc, @thisisnic for your reviews! :grin:

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

msperlin commented 2 years ago

Thanks @s3alfisc, @thisisnic! Realy appreciate the feedback.

I'll make all changes and report here soon.

msperlin commented 2 years ago

@ropensci-review-bot finalize transfer of yfR

ropensci-review-bot commented 2 years ago

Transfer completed. The yfR team is now owner of the repository and the author has been invited to the team

msperlin commented 2 years ago

@s3alfisc, @thisisnic Can I include you both as reviewers in the DESCRIPTION file?

thisisnic commented 2 years ago

Of course, @msperlin , thanks!

msperlin commented 2 years ago

@melvidoni I've made all required changes. Please let me know if anything else is needed. I'm also interested in a blog post @ropensci/blog-editors

s3alfisc commented 2 years ago

Same with me @msperlin, and thanks! :)

melvidoni commented 2 years ago

@melvidoni I've made all required changes. Please let me know if anything else is needed. I'm also interested in a blog post @ropensci/blog-editors

Excellent, thanks. Please, tick all the boxes in the corresponding post if you can.

msperlin commented 2 years ago

@melvidoni I've made all required changes. Please let me know if anything else is needed. I'm also interested in a blog post @ropensci/blog-editors

Excellent, thanks. Please, tick all the boxes in the corresponding post if you can.

@melvidoni I can't tick the boxes in the original post. Should I write a new reply?

melvidoni commented 2 years ago

No worries, then it's fine @msperlin

msperlin commented 2 years ago

I'm planning to submit to CRAN tomorrow. Is that Ok? (sorry but I did not see any guideline about when to submit to cran)

melvidoni commented 2 years ago

It should be okay.

steffilazerte commented 2 years ago

Hi @msperlin,

We'd love to have a blog post about yfR, thank you for offering!

We publish two types of articles, blog posts and technotes. Let us know which type of article you'd like to write and when you think you might have it ready by, and we'll pencil in a publication date that will give us time for a quick review (the date can always change if things go faster or slower than expected).

We have a guide for blog authors (https://blogguide.ropensci.org/) and when you're ready and have a pull request, just set me as the reviewer or ping me and I'll come by to give you some feedback.

Let me know if you have any questions, thanks!

msperlin commented 2 years ago

Thanks @steffilazerte. I'll have a look on the material you posted. But, I will need some time, maybe a couple of weeks to write the post. I'll let you know.

steffilazerte commented 2 years ago

Perfectly reasonable :grin:

msperlin commented 2 years ago

@steffilazerte I wrote a blog post. Just send you the PR and set your as reviewer.