Closed FranzKrah closed 9 years ago
@jooolia assigned
Overall rusda seems to a be useful package for getting data from the USDA databases for getting information about fungal-host associations and literature.
This might be specific to my setup, but I had a small problem as I had the "httr" package already installed and then when installing rusda the install failed on "httr". This removed my "httr" package. I re-installed "httr" and then installed rusda easily. I wonder if adding a minimum version to the "httr" package required could fix this problem? e.g. imports: httr (>= 1.0.0)
No problems installing.
Code is well-commented and functions have clear purpose. Functions are small, modular and logically divided up into different script files.
In the help file under the argument "database" it says that valid input are "HF", "SP" or "both", but when I alter the example to use only "HF" or "SP" I get an error message:
Error in get.HF(tax[[i]], process, spec.type = spec.type) : task 1 failed - "cannot coerce type 'closure' to vector of type 'character'"
Also if I do not give an argument to database I get a warning message with the defaults:
hosts <- associations(spec, clean=FALSE, syn.include=FALSE, spec.type="fungus", process=FALSE)
Warning messages: 1: In if (database == "HF") { : the condition has length > 1 and only the first element will be used 2: In if (database == "SP") { : the condition has length > 1 and only the first element will be used 3: In if (database == "both") { : the condition has length > 1 and only the first element will be used
spec.type
would become spec_type
and get.HF
would become get_HF
and so on. devtools::use_code_of_conduct()
message()
and warning()
for communication with user. There are some times when cat()
is used instead of message()
, you could consider revising to use message()
instead for these. I hope these comments are helpful and please let me know if anything is unclear in these comments. I am also happy to try out any revisions/changes and discuss further.
Regarding installation issues related to httr it seems to be not localized to this package. See @sckott has had this problem: https://gist.github.com/sckott/1e2df73a56283e842245 and is investigating.
Hi Julia,
thank you very much for the constructive thoughts. I corrected the critical points and now I think the package is more user friendly. I set httr to httr (>= 1.0.0) in the description file. Is hope thats correct.
How would you test for correct specification of ‘spec_type’? The input would have to be matched against a taxonomy, like with rgbif and if its nor the correct kingdom there should be a warning message? This would be time consuming for longer input vectors. I don’t see a fast way how to check for the correct input. Thought about that before, too…
Please let me know if you find further improvements.
Cheers, Franz
Franz Krah (B.Sc) Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 10 Jul 2015, at 18:06, Julia Gustavsen notifications@github.com wrote:
Comments:
Overall rusda seems to a be useful package for getting data from the USDA databases for getting information about fungal-host associations and literature.
Installation:
With Windows:
This might be specific to my setup, but I had a small problem as I had the "httr" package already installed and then when installing rusda the install failed on "httr". This removed my "httr" package. I re-installed "httr" and then installed rusda easily. I wonder if adding a minimum version to the "httr" package required could fix this problem? e.g. imports: httr (>= 1.0.0)
With Linux:
No problems installing.
Code Overall:
Code is well-commented and functions have clear purpose. Functions are small, modular and logically divided up into different script files.
Specific comments about the scripts and functions:
associations.R
In the help file under the argument "database" it says that valid input are "HF", "SP" or "both", but when I alter the example to use only "HF" or "SP" I get an error message: Error in get.HF(tax[[i]], process, spec.type = spec.type) : task 1 failed - "cannot coerce type 'closure' to vector of type 'character'"
Also if I do not give an argument to database I get a warning message with the defaults: hosts <- associations(spec, clean=FALSE, syn.include=FALSE, spec.type="fungus", process=FALSE)
Warning messages: 1: In if (database == "HF") { : the condition has length > 1 and only the first element will be used 2: In if (database == "SP") { : the condition has length > 1 and only the first element will be used 3: In if (database == "both") { : the condition has length > 1 and only the first element will be used
Ropensci criteria:
Package name:
nice choice. Function and variable naming:
minor note: Ropensci recommends snake case for functions and variable naming. Argument spec.type would become spec_type and get.HF would become get_HF and so on. Coding style:
generally very good and easy to follow. minor note: more spacing around operators could improve readability of code. See Hadley Wickam's R package book http://r-pkgs.had.co.nz/r.html for what I mean. Readme:
Readme.Rmd is clear, provides clear instructions on how to install and useful examples. -In the readme there is a small typo in the header "An other example" should be "Another example" Code of conduct:
Consider adding a code of conduct to your project to make it easy and welcoming for people to contribute. You could look into devtools::use_code_of_conduct() Documentation:
Looks good. -The documentation files are generally good, but you could consider adding a bit more detail about the type of data you will receive. You could consider adding a vignette with use as a way to make the package even more user-friendly. Examples:
Examples work and illustrate the important features of the package. Recommended scaffolding:
All good here. Testing:
No tests. Could consider adding tests or even assertions. One spot this might be useful is for the input types argument. Instead of a warning this could be tested and if not an informative message is given to the user. Console messages:
good use of message() and warning() for communication with user. There are some times whencat()is used instead ofmessage(), you could consider revising to usemessage()` instead for these. I hope these comments are helpful and please let me know if anything is unclear in these comments. I am also happy to try out any revisions/changes and discuss further.
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-120445737.
Ahh forgot. The new version is on GitHub now!! Cheers
Franz Krah (B.Sc) Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 11 Jul 2015, at 13:28, f.krah@mailbox.org wrote:
Hi Julia,
thank you very much for the constructive thoughts. I corrected the critical points and now I think the package is more user friendly. I set httr to httr (>= 1.0.0) in the description file. Is hope thats correct.
How would you test for correct specification of ‘spec_type’? The input would have to be matched against a taxonomy, like with rgbif and if its nor the correct kingdom there should be a warning message? This would be time consuming for longer input vectors. I don’t see a fast way how to check for the correct input. Thought about that before, too…
Please let me know if you find further improvements.
Cheers, Franz
Franz Krah (B.Sc) Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 10 Jul 2015, at 18:06, Julia Gustavsen <notifications@github.com mailto:notifications@github.com> wrote:
Comments:
Overall rusda seems to a be useful package for getting data from the USDA databases for getting information about fungal-host associations and literature.
Installation:
With Windows:
This might be specific to my setup, but I had a small problem as I had the "httr" package already installed and then when installing rusda the install failed on "httr". This removed my "httr" package. I re-installed "httr" and then installed rusda easily. I wonder if adding a minimum version to the "httr" package required could fix this problem? e.g. imports: httr (>= 1.0.0)
With Linux:
No problems installing.
Code Overall:
Code is well-commented and functions have clear purpose. Functions are small, modular and logically divided up into different script files.
Specific comments about the scripts and functions:
associations.R
In the help file under the argument "database" it says that valid input are "HF", "SP" or "both", but when I alter the example to use only "HF" or "SP" I get an error message: Error in get.HF(tax[[i]], process, spec.type = spec.type) : task 1 failed - "cannot coerce type 'closure' to vector of type 'character'"
Also if I do not give an argument to database I get a warning message with the defaults: hosts <- associations(spec, clean=FALSE, syn.include=FALSE, spec.type="fungus", process=FALSE)
Warning messages: 1: In if (database == "HF") { : the condition has length > 1 and only the first element will be used 2: In if (database == "SP") { : the condition has length > 1 and only the first element will be used 3: In if (database == "both") { : the condition has length > 1 and only the first element will be used
Ropensci criteria:
Package name:
nice choice. Function and variable naming:
minor note: Ropensci recommends snake case for functions and variable naming. Argument spec.type would become spec_type and get.HF would become get_HF and so on. Coding style:
generally very good and easy to follow. minor note: more spacing around operators could improve readability of code. See Hadley Wickam's R package book http://r-pkgs.had.co.nz/r.html for what I mean. Readme:
Readme.Rmd is clear, provides clear instructions on how to install and useful examples. -In the readme there is a small typo in the header "An other example" should be "Another example" Code of conduct:
Consider adding a code of conduct to your project to make it easy and welcoming for people to contribute. You could look into devtools::use_code_of_conduct() Documentation:
Looks good. -The documentation files are generally good, but you could consider adding a bit more detail about the type of data you will receive. You could consider adding a vignette with use as a way to make the package even more user-friendly. Examples:
Examples work and illustrate the important features of the package. Recommended scaffolding:
All good here. Testing:
No tests. Could consider adding tests or even assertions. One spot this might be useful is for the input types argument. Instead of a warning this could be tested and if not an informative message is given to the user. Console messages:
good use of message() and warning() for communication with user. There are some times whencat()is used instead ofmessage(), you could consider revising to usemessage()` instead for these. I hope these comments are helpful and please let me know if anything is unclear in these comments. I am also happy to try out any revisions/changes and discuss further.
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-120445737.
Hi Franz,
Glad you appreciated the comments and I see you've adapted a lot of your code, added a vignette and code of conduct. :smile:
Sorry I was unclear about the testing for species type. I agree it would be onerous to check for existence of specific taxa. I was referring the the spec_type
argument which seems like it should be either "plant" or "fungus". For example in associations.R you have warning(" Make sure spec_type is correctly specified! \n \n" , immediate. = TRUE, call.= FALSE)
. My simple idea was that you could formalize that using something like this:
library(testthat)
expect_match(spec_type, ("fungus|plant"))
What do you think?
I had a quick run through the updated code and looks good, but had a small problem with associations()
spec <- "Fagus sylvatica"
pathogens <- associations(spec, database="HF", clean=TRUE, syn_include=TRUE,
spec_type="plant", process=TRUE)
gives me the output:
Warning: Make sure spec_type is correctly specified!
... retrieving data ... for:
Fagus sylvatica
... extracting Synonyms ...
... cleaning step ...
Error in lapply(res, clean_step, taxa = taxa, syns = syns, spec_type = spec_type, :
object 'res' not found
whereas with database="both"
or database="SP"
I get the desired output.:
Warning: Make sure spec_type is correctly specified!
... retrieving data ... for:
Fagus sylvatica
... extracting Synonyms ...
... extracting Fungus-Hosts DB ...
... extracting Specimens DB ...
... cleaning step ...
Warning: Make sure spec_type is correctly specified!
... retrieving data ... for:
Fagus sylvatica
... extracting Synonyms ...
... extracting Specimens DB ...
... cleaning step ...
Hi Julia,
yes, I think the package improved a lot!
So again you are right. If the “expect_match” condition had been present before the reason for your error would be obvious ;-) I changed “HF” to “FH” since the database name is “Fungus-Hosts” and not “Hosts-Fungus database”…
Please let me know further errors, etc.
Cheers, Franz
Julia Gustavsen notifications@github.com hat am 13. Juli 2015 um 20:38 geschrieben:
Hi Franz,
Glad you appreciated the comments and I see you've adapted a lot of your code, added a vignette and code of conduct. :smile:
Sorry I was unclear about the testing for species type. I agree it would be onerous to check for existence of specific taxa. I was referring the the
spec_type
argument which seems like it should be either "plant" or "fungus". For example in associations.R you havewarning(" Make sure spec_type is correctly specified! \n \n" , immediate. = TRUE, call.= FALSE)
. My simple idea was that you could formalize that using something like this:library(testthat) expect_match(spec_type, ("fungus|plant"))
What do you think?
I had a quick run through the updated code and looks good, but had a small problem with
associations()
spec <- "Fagus sylvatica" pathogens <- associations(spec, database="HF", clean=TRUE, syn_include=TRUE, spec_type="plant", process=TRUE)
gives me the output:
Warning: Make sure spec_type is correctly specified! ... retrieving data ... for: Fagus sylvatica ... extracting Synonyms ... ... cleaning step ... Error in lapply(res, clean_step, taxa = taxa, syns = syns, spec_type = spec_type, : object 'res' not found
whereas with
database="both"
ordatabase="SP"
I get the desired output.:Warning: Make sure spec_type is correctly specified! ... retrieving data ... for: Fagus sylvatica ... extracting Synonyms ... ... extracting Fungus-Hosts DB ... ... extracting Specimens DB ... ... cleaning step ...
Warning: Make sure spec_type is correctly specified! ... retrieving data ... for: Fagus sylvatica ... extracting Synonyms ... ... extracting Specimens DB ... ... cleaning step ...
Reply to this email directly or view it on GitHub: https://github.com/ropensci/onboarding/issues/18#issuecomment-121017895
MSc. Franz Krah Mycology, Phylogenetics, PCM http://www.biodiv.wzw.tum.de/index.php?id=18
Hi Franz,
Ok nice improvements. :+1: I think this looks good.
@sckott I think we are ready for the next step with this. How to proceed?
Hey Julia,
just wanted to know if this has been really been CCd to Scott?
Best,Franz
BSc. Franz Krah Mobile: 0170 5221189 Ecology, Phylogenetic Comparative Methods Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 15 Jul 2015, at 19:27, Julia Gustavsen notifications@github.com wrote:
Hi Franz,
Ok nice improvements. I think this looks good.
@sckott https://github.com/sckott I think we are ready for the next step with this. How to proceed?
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-121686597.
@FranzKrah @jooolia sorry about the delay, just got back from 1 week vacation. I'll take a quick look and let you know
A few fixes before we can merge into ropensci:
rusda
accepted you should be using roxygen documentation. There is a good introduction on how to do that at http://r-pkgs.had.co.nz/man.html if you aren't familiar - let us know if you have any questions. This will help make it easier for others to contribute to the package, and make sure docs aren't out of sync with the functions..Rbuildignore
file - and put in that CONDUCT.md
and Readme.Rmd
README.Rmd
that's executable with knitr blocks and then just knit that to a README.md
, which will render on the github repo. This provides a nice way to make sure your code in the readme actually worksThere's a few other things I see, but can be dealt with later.
@FranzKrah
Hi Scott,
I think I fixed points 1-3. I’m not sure what point 4 means…
Best, Franz
BSc. Franz Krah Mobile: 0170 5221189 Ecology, Phylogenetic Comparative Methods Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 20 Jul 2015, at 23:04, Scott Chamberlain notifications@github.com wrote:
A few fixes before we can merge into ropensci:
To get rusda accepted you should be using roxygen documentation. There is a good introduction on how to do that at http://r-pkgs.had.co.nz/man.html http://r-pkgs.had.co.nz/man.html if you aren't familiar - let us know if you have any questions. This will help make it easier for others to contribute to the package, and make sure docs aren't out of sync with the functions. Add a .Rbuildignore file - and put in that CONDUCT.md and Readme.Rmd I'd prefer if you have a README.Rmd that's executable with knitr blocks and then just knit that to a README.md, which will render on the github repo. This provides a nice way to make sure your code in the readme actually works I'd strongly suggest getting tests written sooner than later - but not required before we accept There's a few other things I see, but can be dealt with later.
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-123038399.
Hi Franz,
Point 4 is regarding unit tests, so this would be a more formal way of doing what you did what expect_identical()
and you would have many cases of things that should and shouldn't work. A way of automatically determining that you code still does exactly what you want it to do even after small changes. Again Hadley Wickam's http://r-pkgs.had.co.nz/tests.html could be very useful.
@FranzKrah
.Rbuildignore
a folder. That should be a single file, the dot at the beginning meaning that it is a hidden file, but it should show up if you're inside Rstudio, e.g, In that file, list CONDUCT.md and Readme.Rmd and any other files that R should ignore when running R CMD CHECK, an example: https://github.com/ropensci/rgbif/blob/master/.Rbuildignoreknitr::knit("README.Rmd")
, which should create a file named README.md
. the html file is okay, but I don't know if CRAN will complain about that when you submit there, so safer to go with a .md fileHi Scott,
I'm sorry for the long delay. I finished my master and had to arrange my PhD position ... and relax
So, I hope that I now understood the .Rbuildignore thing right (this is all new to me ;.)) There was some error and I had to set up the repository anew... "Failed to add file .Rbuildignore to index." Don't know what went wrong... anyway now it's fine.
Where can I read more about the tests?
All the best, Franz
Scott Chamberlain notifications@github.com hat am 21. Juli 2015 um 18:41 geschrieben:
@FranzKrah
- Hmmm, it looks like you made
.Rbuildignore
a folder. That should be a single file, the dot at the beginning meaning that it is a hidden file, but it should show up if you're inside Rstudio, e.g, In that file, list CONDUCT.md and Readme.Rmd and any other files that R should ignore when running R CMD CHECK, an example: https://github.com/ropensci/rgbif/blob/master/.Rbuildignore- Make sure the files inside the current .Rbuildignore folder you have are put back in the root of the repo
- The README.Rmd can be run for example like
knitr::knit("README.Rmd")
, which should create a file namedREADME.md
. the html file is okay, but I don't know if CRAN will complain about that when you submit there, so safer to go with a .md file- tests: what @jooolia said, not required before we accept, but you should get tests started soon
Reply to this email directly or view it on GitHub: https://github.com/ropensci/onboarding/issues/18#issuecomment-123396634
BSc. Franz Krah
Mobile: 0170 5221189
Ecology, Phylogenetic Comparative Methods
Personal Webpage: http://franzkrah.github.io
University: http://www.biodiv.wzw.tum.de/index.php?id=18
Glad you figured out .Rbuildignore
I would suggest reading here about tests http://r-pkgs.had.co.nz/tests.html The testthat
package is what you'll need. Let us know if you have any questions.
Hi Scott, are you waiting for something from me now? The test stuff has to wait… I’m in vacation …
Best, Franz
BSc. Franz Krah Mobile: 0170 5221189 Ecology, Phylogenetic Comparative Methods Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 01 Aug 2015, at 17:51, Scott Chamberlain notifications@github.com wrote:
Glad you figured out .Rbuildignore
I would suggest reading here about tests http://r-pkgs.had.co.nz/tests.html http://r-pkgs.had.co.nz/tests.html The testthat package is what you'll need. Let us know if you have any questions.
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-126930446.
@FranzKrah Looks good. let's move it into ropensci now.
I've created a team in our org account and invited you to it (looks like you've already joined). I think you can transfer the repo to ropensci
Make sure to update github installation instructions and any repo links
let me know if you have any questions
Ok, Scott. Will do that tomorrow ;-) Thanks
On 03 Aug 2015, at 20:21, Scott Chamberlain <notifications@github.com mailto:notifications@github.com> wrote:
@FranzKrah https://github.com/FranzKrah Looks good. let's move it into ropensci now.
I've created a team in our org account and invited you to it (looks like you've already joined). I think you can transfer the repo to ropensci
Make sure to update github installation instructions and any repo links
let me know if you have any questions
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-127360843.
thx!
Ok, I transfered the repo. Hope I did that correctly. Now I will submit to CRAN…
Cheers, Franz
BSc. Franz Krah Mobile: 0170 5221189 Ecology, Phylogenetic Comparative Methods Personal Webpage: http://franzkrah.github.io http://franzkrah.github.io/ University: http://www.biodiv.wzw.tum.de/index.php?id=18 http://www.biodiv.wzw.tum.de/index.php?id=18
On 03 Aug 2015, at 20:45, Scott Chamberlain notifications@github.com wrote:
thx!
— Reply to this email directly or view it on GitHub https://github.com/ropensci/onboarding/issues/18#issuecomment-127367019.
great, thanks!
Package: rusda Type: Package Title: Interface to USDA databases Version: 1.0 Date: 2015-06-25 Author: Franz-Sebastian Krah Maintainer: Franz-Sebastian Krah f.krah@mailbox.org Imports: XML, httr, plyr, foreach, stringr Description: An interface to the web service methods provided by the United States Department of Agriculture (USDA). The Agricultural Research Service (ARS) provides a large set of databases. The current version of the package holds interfaces to the Systematic Botany and Mycology Laboratory (SMML), which consists of four databases: Fungus-Host Distributions, Specimens, Literature and the Nomenclature database. It provides functions for querying these databases. The main function is \code{associations}, which allows searching for fungus-host combinations. License: GPL (>= 2) URL: http://www.usda.gov/wps/portal/usda/usdahome, http://nt.ars-grin.gov/fungaldatabases/index.cfm