ropensci / helminthR

Accesses parasite occurrence records from the London Natural History Museum's Host-Parasite database, which contains over a quarter of a million helminth records.
https://docs.ropensci.org/helminthR
GNU General Public License v3.0
7 stars 5 forks source link

package status? #13

Closed sckott closed 6 years ago

sckott commented 7 years ago

What's status with this pkg @taddallas ? Noticed it's not on CRAN, did you want to get it up there?

taddallas commented 7 years ago

The package is stable, and not really in active development. I don't imagine this package will be regularly updated (there's a very finite number of features we can add to the existing package), so I wonder if placing on CRAN is ideal.

If you have opinions about placing it on CRAN, or ideas for additional features, I'd be happy to hear them.

sckott commented 7 years ago

so I wonder if placing on CRAN is ideal.

not sure what you mean by this? you do or do not want to go to CRAN?

putting it on CRAN will make it easier to use as it's available via the more well known installation method install.packages, and ideally i think all our pkgs should get to CRAN - do you have any reason not to go to CRAN?

taddallas commented 7 years ago

What is the package updating rule for CRAN? Packages not updated for over a year or something are archived? I don't really have a real reason not to place the package on CRAN though, except that having it on Github allows me to not worry about capital letters in function titles and the like. I'll work towards ensuring it meets the requirements, and get it submitted sometime in the next few weeks.

sckott commented 7 years ago

What is the package updating rule for CRAN? Packages not updated for over a year or something are archived?

no, there's no rule like that. They only take pkgs down if they start to break during R cmd check on their servers

having it on Github allows me to not worry about capital letters in function titles and the like.

I think the benefits may outweight the costs - more people may find your package and binaries will be available then which makes installation faster, etc. - if you have any questions about meeting CRAN's rules let me know

taddallas commented 7 years ago

@sckott What is the best way to include data in an R package? It seems devtools::check() will always be unhappy about some aspect (e.g., don't use data() calls within package functions). The way I currently have it is locations.R provides information on the data, and lazyData is set to yes in the DESCRIPTION.

This leads to this note when checking package.

0 errors | 0 warnings | 1 note 
checking R code for possible problems ... NOTE
findHost: no visible binding for global variable ‘locations’
findLocation: no visible binding for global variable ‘locations’
findParasite: no visible binding for global variable ‘locations’
Undefined global functions or variables:
  locations

Also, if you see any obvious things that need fixing or further documentation, let me know.

sckott commented 7 years ago

can put a file like https://github.com/ropensci/taxize/blob/master/R/globals.R in your package listing locations

sckott commented 7 years ago

goodpractice is a good idea to run from time to time, here's the output I get for your pkg:

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 50% of code
    lines are covered by test cases.

    R/cleanData.R:37:NA
    R/cleanData.R:38:NA
    R/cleanData.R:42:NA
    R/cleanData.R:43:NA
    R/cleanData.R:44:NA
    ... and 112 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A
    build date will be added to the package when you perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor
    windows that are about 80 characters wide. Try make your lines shorter than 80 characters

    R/cleanData.R:49:1
    R/cleanData.R:59:1
    R/cleanData.R:61:1
    R/findHost.R:52:1
    R/findHost.R:55:1
    ... and 27 more lines

  ✖ omit trailing semicolons from code lines. They are not needed and most R coding
    standards forbid them

    R/findLocation.R:72:49

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending
    on the input data. Consider using vapply() instead.

    R/findHost.R:73:17
    R/findLocation.R:58:17
    R/findParasite.R:79:17

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...)
    expressions. They are error prone and result 1:0 if the expression on the right hand side is
    zero. Use seq_len() or seq_along() instead.

    R/cleanData.R:56:27
    R/cleanData.R:81:19
    R/listLocations.R:36:12

The Date in DESCRIPTION, long code lines, and semicolon type things are clearly less about correctly functioning code, so perhaps less important, but still recommend fixing those.

The advice about sapply() and 1:nrow type things are good idea to change.

taddallas commented 7 years ago

I think I've fixed the majority of the goodpractice flags. However, I'm getting the following error message when using R CMD check

fix this R CMD check WARNING: LaTeX errors when creating
    PDF version. This typically indicates Rd problems.

But I do not get this error when using devtools::check(). Other that that, I think the package is about ready to submit to CRAN.

sckott commented 7 years ago

I'll have a look

sckott commented 7 years ago

hmm, i'm not getting that latex thing and the travis build looks fine https://travis-ci.org/ropensci/helminthR

taddallas commented 7 years ago

Okie dokie. I think it's about ready to ship. I've run goodpractice on it (results below). Some of the long code lines are unavoidable, stemming from long urls and the like. More unit tests would be nice, so I might add those in before I submit. Any other suggestions?

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 65% of code lines are covered by test cases.

    R/cleanData.R:34:NA
    R/cleanData.R:41:NA
    R/cleanData.R:48:NA
    R/cleanData.R:49:NA
    R/cleanData.R:50:NA
    ... and 79 more lines

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/cleanData.R:45:1
    R/findHost.R:68:1
    R/findHost.R:83:1
    R/findHost.R:111:1
    R/findLocation.R:50:1
    ... and 13 more lines
sckott commented 7 years ago
sckott commented 7 years ago

changes

add some files:

pkg checks

submitting:

taddallas commented 7 years ago

Output from devtools::build_win() as follows

* installing *source* package 'helminthR' ...
** R
** data
*** moving datasets to lazyload DB
** preparing package for lazy loading
Error in library.dynam(lib, package, package.lib) : 
  DLL 'geosphere' not found: maybe not installed for this architecture?
ERROR: lazy loading failed for package 'helminthR'
* removing 'd:/RCompile/CRANguest/R-devel/lib/helminthR'

Thoughts?

sckott commented 7 years ago

Hmm, i'll check -

sckott commented 7 years ago

a good idea to use appveyor which does like Travis does but for Windows - add a file to root of your repo called appveyor.yml, with e.g.,

environment:
  R_CHECK_ARGS: --no-build-vignettes --no-manual
  USE_RTOOLS: true

init:
  ps: |
        $ErrorActionPreference = "Stop"
        Invoke-WebRequest http://raw.github.com/krlmlr/r-appveyor/master/scripts/appveyor-tool.ps1 -OutFile "..\appveyor-tool.ps1"
        Import-Module '..\appveyor-tool.ps1'
install:
  ps: Bootstrap

build_script:
  - travis-tool.sh install_github hadley/devtools
  - travis-tool.sh install_deps

test_script:
  - travis-tool.sh run_tests

on_failure:
  - travis-tool.sh dump_logs

notifications:
  - provider: Slack
    auth_token:
      secure: S3AcHEoJHUtahR5N8ConStS8oV/+x34tS1bDGM3OD0QxDKBBxufeiMmXQsId1gZu
    channel: '#builds'

here's where r-appveyor repo is developed to look for any help and tips https://github.com/krlmlr/r-appveyor

taddallas commented 7 years ago

Appveyor seems to be happy with it.

sckott commented 7 years ago

Okay, looks like ggmap depends on geosphere https://cran.rstudio.com/web/packages/ggmap/

can you run it again, this time on both devel and release build_win(version = c("R-release", "R-devel"))

taddallas commented 7 years ago

Now seems to run fine both using devtools::build_win and with appveyor. :confused:

I guess it's ready to submit to CRAN? I will look into your suggestion of using curl or httr, but that may be a "next update" type of change (unless you have strong opinions about changing this now).

sckott commented 7 years ago

It's definitely best practice to use a proper HTTP client - so if not now, then very soon


small correction, I was referring to crul, not curl - Jeroen on our staff does make curl, but it's a much lower level client requiring user to know more about the curl protocol

sckott commented 6 years ago

any thoughts @taddallas ?

taddallas commented 6 years ago

:grimacing:

I was just thinking about this. It's been too long. I recall the task of getting everything over from the rvest + xml2 framework being a new area and potentially a challenge, and then got bogged down with postdoc research, followed by an international move, followed by more postdoc research. I'll look into using crul in the next week.

Any helpful tutorials that you could suggest would be useful.

sckott commented 6 years ago

glad you're revisiting it. there's some vignettes listed here https://cran.rstudio.com/web/packages/crul/

taddallas commented 6 years ago

I've replaced the functions from xml2 to read urls with the httr approach. A bit of error handling now, the initial parsing of the content is done using httr, but much of this still uses rvest. Feel free to suggest any changes. Otherwise, I think it may be nearly ready for CRAN submission.

sckott commented 6 years ago

HTTP request code can be simplified , e.g., https://github.com/ropensci/helminthR/blob/master/R/findHost.R#L84 can be more like

url <- "http://www.nhm.ac.uk/research-curation/scientific-resources/taxonomy-systematics/host-parasites/database/results.jsp"
args <- list(dbfnsRowsPerPage=500000, x=13, y=5) # and so on for the rest of your parameters
x <- GET(url, query = args)
stop_for_status(x) # probably want to have a catch all http request failure check in addition to your `!= 200` check

I'd make sure to put skip_on_cran() in all of your tests that make web requests so that they are not run on CRAN. You do not want CRAN maintainers emailing you because they are not nice.

taddallas commented 6 years ago

Incorporated. Is this ready to submit to CRAN?

devtools::check()

Status: OK

R CMD check results
0 errors | 0 warnings | 0 notes
sckott commented 6 years ago

Make sure before you submit to check with win-builder by doing devtools::build_win(version = c("R-release", "R-devel")) you'll get an email with link to a report for the package built and checked on windows - if no notes/warning/errors there then you should be good to go

taddallas commented 6 years ago

Nice. Package v.1.0.5 is accepted to CRAN and marked as a release here.