ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

submission: ess #201

Closed cimentadaj closed 6 years ago

cimentadaj commented 6 years ago

Summary

It allows users to download and explore data from the European Social Survey directly into R. Broadly speaking, it has two family of functions. The ess_* family, which downloads data from specific waves and countries and the show_* functions which allows the users to check which waves/countries are available.

Package: ess
Title: Download Data from the European Social Survey on the Fly
Version: 0.1.1
Authors@R: person("Jorge", "Cimentada", email = "cimentadaj@gmail.com", role = c("aut", "cre"))
BugReports: https://github.com/cimentadaj/ess/issues
Description: Download data from the European Social Survey directly from their website <http://www.europeansocialsurvey.org/>. There are two family of functions that allow you to download and interactively check all countries and rounds available.
Depends: R (>= 3.4.0)
License: MIT + file LICENSE
URL: https://github.com/cimentadaj/ess
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Imports: 
    xml2,
    httr,
    stringr,
    haven,
    rvest,
    tibble
Suggests: 
    testthat,
    knitr,
    rmarkdown,
    covr
VignetteBuilder: knitr
Note: The ess package is named to match the European Social Survey (ESS) hosted at http://www.europeansocialsurvey.org/. It is unrelated to the Emacs Speaks Statistics (ESS) project, an emacs-based Integrated Development Environment hosted at https://ess.r-project.org

[e.g., "data extraction, because the package parses a scientific data file format"]

data retrieval, because the main objective of the package is to download a set of datasets widely used in the social sciences.

Mainly academics working in disciplines such as Economics, Demography, Psychology, Political Science and Sociologists. Hundreds of papers have been written using the European Social Survey, a survey carried over in over 30 countries for over 15 years that asks citizens everything from their health status to their political preferences. This package will allow full reproducibility for many of the academics using the survey and equally important, will remove the burden of manually exploring/downloading the datasets from the ESS website.

The lodown package (not on CRAN) has one function for downloading the data as 'Stata' or 'SPSS' files for only specific waves. The package is not intended specifically for the ESS data. This package differs because it allows the user to download the files directly into R selecting different wave/country combinations. It also supports downloading the files in 'Stata', 'SPSS' and in 'SAS' and offers a range of functions for exploring the available countries/waves directly from R.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

It conforms.

I don't know anyone who has experience with this type of packages but I assume anyone with relevant experience in web scraping is adequate. The workhorse of the package is scraping the ESS website, reading it with the haven package and putting the data in a suitable format for R.

maelle commented 6 years ago

@cimentadaj thanks for your submission! I'm doing the editor checks right now and... the tests take a very long time, right?

maelle commented 6 years ago

Editor checks:


Editor comments

Thanks for your submission @cimentadaj! :smile_cat: The tests are now finished. ๐Ÿ˜Œ I'm currently looking for a second reviewer (thanks @leeper for accepting to review!).

Please add this badge to ess README

[![](https://badges.ropensci.org/201_status.svg)](https://github.com/ropensci/onboarding/issues/201)

Here are a few comments/questions.


Reviewers: @leeper @nujcharee Due date: 2018-03-29

leeper commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 0.5

Review Comments

This package is a very easy-to-use tool for download ESS data. It will be of great benefit to ESS users in both a teaching and research context. In the latter case, it will no doubt help considerably with reproducibility of analyses using the ESS. The code is generally well-written and the documentation is mostly excellent. Both the README and vignette provide useful introductions to the package and explain all functionality.

I have flagged seven issues I would like to see responses to:

These are all suggestions for improvement rather than obvious bugs or problems with the package. The only other suggestion I would have is that this package may significantly increase interest in the ESS among R users and therefore it may be valuable to expand the examples in the README or add an additional vignette that showcases some of the really interesting substantive data available in the datasets. It would be great to see a few tables or graphs showing some interesting patterns. Not necessary, of course, but that might help sell R users on the ESS in the way that the package is already helping sell ESS users on R.

Great package.

maelle commented 6 years ago

Wow thanks a lot for your extremely rapid and good review @leeper ๐Ÿ˜ฎ ๐Ÿ‘

Could you please fill and add the review template to your comment? Thanks in advance.

maelle commented 6 years ago

Just a suggestion as a reaction to @leeper's excellent suggestion to add examples, I tend to think minimal READMEs make more sense with a pkgdown website (especially if the README has links to articles/vignettes), so adding a vignette or several ones might be better than expanding the README.

maelle commented 6 years ago

@cimentadaj also note that I'll expect you to answer to/comment on @leeper's issues in this thread as well to make all information about your changes available here as well. ๐Ÿ˜บ

cimentadaj commented 6 years ago

Hi everyone, thanks very much for the comments, they're spot on. I am away from computers until Monday, where I'll review all of the suggestions. Thanks again!

On Fri, 9 Mar 2018 at 08:55, Maรซlle Salmon notifications@github.com wrote:

@cimentadaj https://github.com/cimentadaj also note that I'll expect you to answer to/comment on @leeper https://github.com/leeper's issues in this thread as well to make all information about your changes available here as well. ๐Ÿ˜บ

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/201#issuecomment-371752207, or mute the thread https://github.com/notifications/unsubscribe-auth/AHzRORuUIdba4xmEO91-c8EKcpwXvGDpks5tckNxgaJpZM4SbhTx .

--


Jorge Cimentada https://cimentadaj.github.io/ https://cimentadaj.github.io/

maelle commented 6 years ago

Thanks for agreeing to review this package @nujcharee! :smile_cat: Your review is due on 2018-03-30.

As a reminder here are links to the recently updated reviewing and packaging guides and to the review template.

@cimentadaj you can wait for @nujcharee's review before responding if you wish.

nujcharee commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 6


Review Comments

The ESS package provides programmatic access to European Social Survey published on http://www.europeansocialsurvey.org/downloadwizard/. Users can directly download the surveys from the website in 3 different formats: SPSS, STATA and CSV. The ESS package is compact and easy to use with 2 main family functions: show* and ess*. It also offers functions that handle missing values.

There are a few functions that use external packages such as library(rvest), library(xml2). At the installation these packages weren't installed as they were already installed on my local machine.

The vignettes gives an introduction of what ESS is and contains a link to the website. It also demonstrates common usage and are easy to follow.

I installed and ran it successfully on Windows 7 (R-3.4.2) and Mac OS X (R-3.3.2) I used test_package("ess") and received the following message OK: 38 SKIPPED: 23 FAILED: 0

Possibilities for improvement

library(tidyverse)
one_round <-
  ess_rounds(1, "your_email@email.com") %>%
  recode_missings()

Final note

Great package, I can see the potential for it to become very popular among social science professionals.

nujcharee commented 6 years ago

@maelle I hope this is useful. Let me know if I can assist you further :)

maelle commented 6 years ago

Thanks a lot @nujcharee for your great review! ๐Ÿ˜บ ๐Ÿ‘Œ

@leeper could you please add the review template to yours? ๐Ÿ™

@cimentadaj both reviews are now in! ๐Ÿ˜€

leeper commented 6 years ago

@maelle done

maelle commented 6 years ago

Thanks @leeper!

jimhester commented 6 years ago

I have no background in social sciences so have no idea how ubiquitous the acronym ESS for European Social Survey is, but in the R / S community there is long precedent for ESS standing for Emacs Speaks Statistics, dating back to at least 1994, before R existed. Emacs Speaks Statistics is also quite popular for emacs R users, so this is not a niche issue.

I am actually astonished that this package was allowed on CRAN with this name and would really encourage considering changing the name to something more descriptive; maybe essurvey would suffice, you could retain the ess_ shorthand for the internal functions.

cimentadaj commented 6 years ago

Everyone, thanks for the suggestions/comments. I will go one by one. @maelle

This is because it has conflicts with Emacs Speaks Statistics and the Emacs authors suggested that I add it as a note and they will add it as well to their R package. In fact, the point raised by @jimhester has been raised by several other R users in this email thread. Among social scientists the ESS acronym is widely known and would be the first thing that comes to mind, yet not many of these scientists are R users, whereas Emacs Speaks Statistics is widely used/known in the R community. I've decided that it is best for the package to be renamed and rereleased in the future. In fact, I was thinking of using the same name @jimhester suggested, essurvey. I don't know how this affects the rOpensci submission, let me know if something needs to be changed. I will make the name change effective when I have time, somewhere in the next few weeks.

Done ๐Ÿ‘

Done ๐Ÿ‘

Please document the fact that one needs to set "your_email" as an environment variable. I moreover think its name is a bit confusing, it is my email ๐Ÿ˜‰ maybe it should be "ess_account_email"? I'd recommend you to look for the email by default like what is done in this package

Currently your_email is not an environment variable but purely an argument. Following the comments of @leeper I think a better approach would be set it as environment/option variables but also allow the email as an argument (for backward dependency), encouraging the user not to use the email as an argument; this would be explained in detail in the vignette, which might cause confusion for ESS users not familiar with R. The approach would be similar but slightly different from the package you suggested.

cimentadaj commented 6 years ago

Thanks @leeper! I will review your suggestions in the coming days. I think that making a more 'practical' vignette is also a good idea. I will work on it.

cimentadaj commented 6 years ago

@nujcharee thank you very much for your comments! I agree with your suggestions and will implement them soon. In fact, I think you also raised some common points that overlap with what @leeper suggested. Thanks!

cimentadaj commented 6 years ago

EDIT: I will post all of @leeper 's answers here:

Issue https://github.com/cimentadaj/ess/issues/20 asked for package documentation. Closed it with https://github.com/cimentadaj/ess/commit/ed0df5c4856a25233f62f1f37221f170bb7cc2be


Issue https://github.com/cimentadaj/ess/issues/21 suggested that the package should have a better documentation for show_rounds_country(). The function accepts a numeric vector of rounds (currently from 1:8) and returns the countries that participated in all of these rounds (that is, AND rather than OR). The issue suggested either improving documentation to make this clear or adding a separate function for AND and for OR.

See https://github.com/cimentadaj/ess/commit/76007c5ac35161629a3eefed4a941a9a726dee0d where I improved documentation as to make it clear with some examples. Adding a OR type of function could be handy, and something that I will add for next releases yet I don't think it's urgent. Users can find a work around around along theses lines purrr::map(ess::show_rounds(), ess::show_rounds_country).


Issue https://github.com/cimentadaj/ess/issues/22 suggests to break the ess_* functions into two: an import_* and download_* function. This is the case because all ess_* functions have the functionality to import data into R or download data into a directory without returning anything in R. This change makes a lot of sense and I think is the right way to approach this. In fact, the implementation should be straight forward. However, I'm concerned about backward compatibility and the fact that I plan to change the name of the package to essurvey in the near future.

My main concern is having both the breaking compatibility change in the ess package and then changing the name of the package in a short period of time. I would want to give some time to users to communicate the API changes and let them know that the functions will be deprecated without also swamping them with a new package name (let's remember that the users of this package are not necessarily R users but people who come to R for specific things). I've thought a lot about this and I think this could be one solution:

I'm really hesitant on this step so I'd like to hear your comments. Perhaps it's not the best idea to apply these changes in the first place because the breaking changes are not extremely necessary.

I'd really like to hear your feedback on this particular point.


https://github.com/cimentadaj/ess/issues/23 gave a very reasonable suggestion to set the your_email argument as an environment variable rather than using it in every ess_* function call. I think this is very useful feedback! I implemented this in https://github.com/cimentadaj/ess/commit/11fcf116d1f821dcb2aeef2ed154e67cf2c32aa0

A summary of the changes:

All tests pass, so it should be good to do. (Again, very sorry that the commits seem to rewrite whole documents. Something weird is happening when making changes using the Ubuntu terminal in Windows 10.)


Issue https://github.com/cimentadaj/ess/issues/24 suggested that every request made to the ESS website should be made over HTTPS rather than HTTP for security reasons.

See https://github.com/cimentadaj/ess/commit/57656ffdcbac6e156752602009210238e7e9ea4f for the implementation.

Good suggestion, I had no idea of the difference between HTTPS and HTTP. Let me know if this is fine, to close the issue.


Issue https://github.com/cimentadaj/ess/issues/25 was about using match.arg to match the format argument in all of the ess_ functions instead of creating custom checks.

I hadn't thought about it! Thanks. I'm applying it to the case of matching the format argument through out the package. However, in the other cases prefer to have a separate stop message, like here. This is because it checks whether the supplied country name is available in the ESS website and instead of match.arg returning all the country names (long) I stop with a clear message suggesting the user to check show_countries() for the correct country name. Applies to rounds (and themes), with an even more explicit error.

There are other instances where it make sense such as here but it doesn't work as expect because it needs to match every code and throw an error if it doesn't. For example match.arg(c("123", "Refusal"), c("Don't Know", "Refusal"), TRUE) only returns Refusal but doesn't raise an error. So in these other cases I also think the error message is more strict. See https://github.com/cimentadaj/ess/commit/0336d37ae09003f25c60ea769a9fc685d67f03e7. Let me know if it addresses your concern so I close that issue.


Issue https://github.com/cimentadaj/ess/issues/26 suggested to add documentation about using weights for analyzing the ESS data. More specifically it advised either writing a second vignette showing examples or warn the user that it should use weights in the README, vignette and package documentation. I closed this via https://github.com/cimentadaj/ess/commit/d83f728a2c63a381edf3d7a4bc415e234fc87594 by including comments and notes on the weights on the three sources of documentation. Implementing a second vigenette, together with more general examples using the weighted ESS data is something I have on my to-do list. In fact, I haven't created these examples because I'm currently working on developing a package to analyze the ESS. Once I have that done I'll create more vignettes for both packages.

PS: Sorry for the whole document rewrite in the commit. I'm getting these weird bugs using the Ubuntu terminal in Windows 10.

jimhester commented 6 years ago

In terms of the particulars of renaming the package, I think the best way to do it is to submit the package to CRAN with the new name, and in your comments to CRAN in the submission say you want to rename the package and archive the old name along with the reasons for doing so.

maelle commented 6 years ago

@cimentadaj changing names doesn't impact the submission. I know that because my Ropenaq package was renamed ropenaq during its onboarding ๐Ÿ˜‰ (I didn't have it on CRAN but you have @jimhester's guidance above, by the way thanks Jim for chiming in!)

maelle commented 6 years ago

@cimentadaj great job! ๐Ÿ‘Œ Just a note about the process, it'd be easier for future readers of the thread to group your answers and to write here what the issue was about so that one doesn't need to click. Moreover if you want to link a commit to an issue in the commit message you can write something like "blabla closes #" with the issue number (or just "see" instead of "close" if you don't want to close it yet)

cimentadaj commented 6 years ago

Thanks @maelle ! I just edited the messages to group all of my answers in one single comment. I will update that comment as I fix the issues.

cimentadaj commented 6 years ago

Here I will respond to @nujcharee's suggestions.

In https://github.com/cimentadaj/ess/issues/27, @nujcharee suggested adding library(tidyverse) to the README. Implemented in https://github.com/cimentadaj/ess/commit/25628bc99ff679f27ce428af105d425fd60e277e.

Some other suggestions.

In my opinion, for those which aren't familiar with Social Survey, it might be really beneficial from a brief description of what the term "round" mean and how this is used in the context of social survey. Thank you for this! Implemented it in https://github.com/cimentadaj/ess/commit/f1c730069b3f9ecd63ee56c0f735af9c7f7f1441

Opportunities to apply global variables for "email address": applied in https://github.com/cimentadaj/ess/commit/11fcf116d1f821dcb2aeef2ed154e67cf2c32aa0 from https://github.com/cimentadaj/ess/issues/23

Opportunities to merge functions "ess_all_rounds()" and "ess_rounds()" where "all" is default if country variables aren't passed. I'm not sure I get this @nujcharee. ess_rounds returns the complete data for all countries in the specified round. ess_all_rounds is just a wrapper around ess_rounds that downloads all rounds for all countries.

Making the country_name argument case insensitive is a potential improvement. This is indeed a potential improvement. I'll put it on as an issue for the next release in https://github.com/cimentadaj/ess/issues/28

cimentadaj commented 6 years ago

@maelle, my answer's to @leeper and @nujcharee are in.

leeper commented 6 years ago

I'm happy with all of the responses to my comments.

maelle commented 6 years ago

@nujcharee are you happy with @cimentadaj's answer?

nujcharee commented 6 years ago

@maelle @cimentadaj I am also happy yes ;)

nujcharee commented 6 years ago

I think in terms of the function โ€œallโ€ I think personally that โ€œallโ€ function can be obselete if it can be embedded in ess_round function when countries arenโ€™t specified.

I made a suggestion as a recommendation. Looking forward to seeing this on CRAN soon good luck.

cimentadaj commented 6 years ago

Thanks a lot @nujcharee! @maelle, if the package gets accepted we need to sort out the issue with name of the package. Right now the name of the package is ess both in CRAN and on Github. During this week I'm going to finish all of the changes from the review and change the Github package name to essurvey and resubmit to CRAN in the next weeks. Perhaps the package can be essurvey in rOpensci while I resubmit it to CRAN with that name.

maelle commented 6 years ago

Thanks @nujcharee and thanks @cimentadaj!

I still need to perform the last editor checks (should lead to minor comments only). Maybe it's best to have waited for you to have finished all review changes before I have a last look anyway, so ping me when it's done if that's ok with you?

We could transfer even before the name change, I think redirections from the old account/reponame would still work if it is newaccount/newreponame. But I haven't checked that yet.

cimentadaj commented 6 years ago

Ok @maelle , the changes are in and the package has be renamed. I switched it to essurvey on Github and everything on the package has been renamed to essurvey over ess. All tests pass so I'm just waiting for the badge update. I will submit to CRAN once this review process is over to make the process much smoother.

maelle commented 6 years ago

Approved! Great work @cimentadaj and great reviewer work @leeper & @nujcharee ! A few last comments/questions from me

Now here is the list of things you have to do before I close this issue ๐Ÿ˜‰

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

maelle commented 6 years ago

And thanks again @jimhester for chiming in! ๐Ÿ‘Œ

cimentadaj commented 6 years ago

@maelle thank you very much for the review. Thanks as well to@leeper and @nujcharee for some great reviews. essurvey benefited a lot from your comments.

Answer's to @maelle's comments below:

Have you checked whether it's ok to use the ESS logo? Hmm, I'm not entirely sure but the communications office of the ESS has been retweeting the package constantly in their official account, so I'm guessing there shouldn't be a problem. Here's one example but there's plenty more.

Only a suggestion, you could choose to add the reviewers to DESCRIPTION as "rev", with their permission. Done! @leeper and @nujcharee would you like to be listed as reviewers? @nujcharee is your complete name Nujcharee Haswell?

the install.packages instruction is currently wrong, maybe you should remove it and add it back when the package is on CRAN? Yeah, I forgot, thanks!

Add the pkgdown link to the repo URL field and also to DESCRIPTION (after a comma) I added the pkg website to DESCRIPTION for I can't seem to find a URL field in the repository. What do you mean exactly?

cimentadaj commented 6 years ago

@maelle, finalized all changes to close the issue. You can check out the repo at https://github.com/ropensci/essurvey

Thank you all for this experience! I learned a lot from it and I'd love to write an extended blog post on the review process and how to use the package @stefaniebutland.

maelle commented 6 years ago

@cimentadaj you couldn't see the repo description "edit" button because you were not an admin yet. Now I've made you admin, and I've added the pkgdown URL in the description at the same time, hope that's ok.

cimentadaj commented 6 years ago

@maelle yes, that's fine! Thanks.

nujcharee commented 6 years ago

@cimentadaj Hi there yes thats my name Nujcharee Haswell. :)

cimentadaj commented 6 years ago

@leeper would you also like to be listed as a reviewer in the CRAN submission?

stefaniebutland commented 6 years ago

Glad to hear you're interested it writing a post @cimentadaj. Here are some technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post

We ask that you submit a draft via pull request at least one week before the planned publication date. At this point I have posts scheduled through April so perhaps best for you to suggest a deadline by which you would like to submit a draft.

cimentadaj commented 6 years ago

Thanks @stefaniebutland! I could send a pull request the somewhere between the 20th and 30th of April for review where the post would land perfectly for sometime in May. Let me know if this is alright.

stefaniebutland commented 6 years ago

Sounds good. I will ping you around April 17th to make sure your draft is still on track for submission between 20th and 30th.

maelle commented 6 years ago

Closing this now that the package has been onboarded. @leeper can still answer about being listed as "rev" in DESCRIPTION โ˜บ

cimentadaj commented 6 years ago

Hi @leeper, just a reminder, would you like to be listed a reviewer when I submit the package to CRAN?

leeper commented 6 years ago

Sure. Happy to be.