ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

Submission: photosearcher #325

Closed Dr-Nathan-Fox closed 4 years ago

Dr-Nathan-Fox commented 5 years ago

Submitting Author: Name (@nfox29)
Repository: nfox29/photosearcher Version submitted: 1.0 Editor: @sckott
Reviewer 1: @monicagerber
Reviewer 2: @zkamvar Archive: TBD
Version accepted: TBD


Package: photosearcher
Title: Photo Searcher
Version: 1.0
Authors@R: 
    c(person("Nathan", "Fox", role = c("aut", "cre"), email = "nf2g13@soton.ac.uk"),
           person("Francesca", "Mancini", role = c("aut")),
           person("Laura", "Graham", role = c("aut")),
           person("Louis", "Sutter", role = c("aut")),
           person("Tom", "August", role = c("aut")))
Description: Queries the Flick API (https://www.flickr.com/services/api/)to return photograph metadata as well as the ability to download the images as jpegs.
Depends: R (>= 3.5.0)
License: GPL
Encoding: UTF-8
LazyData: true
Language: en_GB
Imports: 
    xml2,
    httr,
    dplyr,
    glue,
    clisymbols,
    crayon,
    sf
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.1.1
URL: https://github.com/nfox29/photosearcher
BugReports: https://github.com/nfox29/photosearcher/issues
Suggests: 
    httptest,
    knitr,
    roxygen2,
    testthat (>= 2.1.0),
    rmarkdown,
    covr,
    rplos,
    wordcloud,
    tidyverse
VignetteBuilder: knitr

Scope

The package provides functions that make calls to the Flickr API to return metadata (including georeferenced location) for photographs in a reproducible manner.

The package is aimed to facilitate reproducible code for scientific research that utilises the Flickr API. Currently, a wide range of scientific disciplines utilise the Flickr API including environmental sciences, biological sciences and medicine. Flickrs georeferenced images have a wide range of scientific applications including ecosystem service assessments, resource management decisions and mapping species distributions.

We are aware of one other package that makes calls to the Flickr API. However this package only allows user to search for a single image based on its ID number. Our package provides users with the ability to search for photographs based on a user defined criteria including, date of the photograph being taken, tags, text and spatial location. Furthermore our package provides additional functionality including the downloading of images, retrieving social data about users as well as get tags related to a specific location, or other tags.

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options - [ ] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [x] The package is novel and will be of interest to the broad readership of the journal. - [x] The manuscript describing the package is no longer than 3000 words. - [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

noamross commented 5 years ago

Thanks for your submission, @nfox29! I have a question regarding the relationship of this package and FlickrAPI. A quick review suggests to me that these packages cover different, but complementary parts of the API. Is that correct, or is the difference in how API outputs are handled?

I ask. We generally aim for RO packages to be best-in-category because we want the RO package to be the one we'd recommend for the task. In this case, it seems the best outcome would be to combine the functionality of the two packages, either by joining forces, or you adding the parts of the API that FlickrAPI has, so that we have one best package for accessing Flickr. Would you agree or do you think the packages handle very different applications?

Dr-Nathan-Fox commented 5 years ago

Thank you @noamross for your swift reply. I appreciate that there seems to be similarities between the two packages.

flickrAPI has four functions getExif, getHotTags, getPhotoInfo and getPhotos. Out of those four we currently do not have getExif, getHotTags and getPhotos.

The main function we share is searching for photo metadata. We use the API method flickr.photos.search which allows users to search for photos based on their own arguments including tags and spatial location. flickrAPI on the other hand used the API method flickr.photos.getInfo. The difference with this method is that it only allows user to search for information on one specific image. The argument to search for a specific image by its ID can be incorporate into our method. Second the way that data is returned is handled slightly differently. The use of our method allows users to return all available metadata on an image to a single dataframe, whereas the call made by flickrAPI returns limited data for one of "Location", "Date", "URL", "Tags".

Of the functions that we do not have in our package we have chosen not to include the getHotTags function as this API method does not work Flickr side and will only ever return the same result regardless off the arguments. With the other two, flickrAPI doesn’t utilise the full range of arguments available to the calls (for example the getPhotos function in flickrAPI only has one augment, whereas the API method has many more). If you deem it necessary to include these functions I will add them to this package with the greater functionality available on the API.

Besides that – our package has additional API methods, including downloading images, getting the top tags for locations and assessing related tags. Our package also has some quality of life benefits, such as storing you API key so that you do not have to enter it as an argument to the functions every time.

I will now add the option to search for a photos data via its ID number to the photosearch function, as well as adding the two other functions.

Dr-Nathan-Fox commented 5 years ago

Hi @noamross, to address the issues you raised I have now updated the package to include the functionality that flickrAPI included that we originally did not. Our package can now:

I have also updated testing, docs and vingettes to match these updates.

I believe our package now covers the complete functionality of flickrAPI, whilst providing additional functions and returning the data in a more complete way. I hope these resolves the issue and the package can be considered for submission. Kindest regards, Nathan

noamross commented 5 years ago

Great, thanks you @nfox29. @sckott will be your editor.

sckott commented 5 years ago

Editor checks:


Editor comments

Thanks for your submission @nfox29 !

License: you have GPL in your license file. I'm not sure I've seen that before. I usually see GPL-2, GPL-3, etc. - see https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Licensing - Did you mean GPL per se?

Here's the output from goodpractice. If you haven't used goodpractice it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started.

── GP photosearcher ─────
It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 68% of code lines are covered by test
    cases.

    R/download_images.R:81:NA
    R/download_images.R:82:NA
    R/download_images.R:83:NA
    R/download_images.R:136:NA
    R/download_images.R:137:NA
    ... and 121 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about
    80 characters wide. Try make your lines shorter than 80 characters

    R/photo_search.R:108:1
    R/photo_search.R:180:1
    R/photo_search.R:184:1
    R/photo_search.R:190:1
    R/utils.R:32:1
    ... and 1 more lines

───────────

Seeking reviewers now 🕐


Reviewers:

Dr-Nathan-Fox commented 5 years ago

Thanks @sckott to address your comments I have fixed the line length issues. The reason currently some lines aren't covered by tests is due to a Flickr side issue. Currently anything that requires a Flickr location identifier does not work. Therefore both the location_tags and find_place functions do not work, as well as searching for photographs with a woeID using the photo_search function. Because of this any test that are related to those functions are currently being skipped until Flickr resolves the issue. More generally there are also a few lines of code that catch errors that I can not replicated in tests.

Kindest, Nathan

sckott commented 5 years ago

thanks

sckott commented 5 years ago

reviewers assigned: @monicagerber @zkamvar

monicagerber commented 5 years ago

Hi @nfox29! 👋 I have a question about getting started with your package. In the README, you write:

The package requires a valid API key from the Flickr development page. The first time you call a function from the package, you will be prompted to create and enter your API key. The API key will then be saved as a .txt file and be called when using any other function.

Do you have an example of how the user will be prompted to create and enter an API key? I've installed your package and tried out photo_search() but wasn't prompted to create a key. I went ahead and created an API key from the Flickr development page, but as a naïve user, I'm not sure where to save the key.

sckott commented 5 years ago

@monicagerber @zkamvar reminder your review is due on Monday (Monica, I do see you've got started, thanks)

Dr-Nathan-Fox commented 5 years ago

@monicagerber when you use the photo_search function (or any of the other packages functions) you should be prompted to paste or type your API key into the console. All functions check whether an API key exists as a a file called api_key.txt and if not it should prompt you to enter it, whilst opening the Flickr development page to facilitate requesting an API key if you do not have one. It is then saved to be called upon for future functions. If an API key has been entered but is not a valid API key it will warn you that the API key is not valid and stop the function. Users are also able to save their API key directly as a file api_tey.txt. Let me know if you have continued issues with this and I will look into why it has not worked for you. Kindest regards.

zkamvar commented 5 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 4


Review Comments

My strategy for reading through this was to do the following:

  1. work through the example in the README
  2. attempt to adapt the example to my own interest
  3. work through the examples in the vignette presented

I have installed the repository from the most recent commit, which was here:

repo   <- "nfox29/photosearcher"
commit <- "@93895459fbde7a50b87ce0717a75e383ffc245dd" 
remotes::install_github(paste0(repo, commit), dep = TRUE)

Overall, I think this is a good start, but it still needs some polishing. The main function of this package seems to be photo_search(), which works well once you have your API key set up (difficulty: not very). I was able to search for snakes in the UK and got them in abundance, so the package works with that expectation.

Based on my walkthrough of the documentation I think the entrypoints for the API make sense, but the output could use significant improvement.

For example, the output of photo_search() has 57 columns, but the documentation does not state exactly what they represent. I could tell that 30 of these columns were dedicated to different iterations of width, height, and URL for the results with cryptic suffixes. Moreover (as mentioned later), all of the columns were encoded as character despite the fact that there were integer, numeric, date, and boolean columns.

Moreover, the get_photoinfo() returns similar columns, but the dateuploaded column is not formatted as the date columns of photo_search() (Y-m-d H:M:S), but as an integer formatted as a character. The only other function that returns a data frame user_info() converts characters to factors.

The other functions all return character vectors with the exception of download_images(), which returns a data frame.

My recommendations for improvement:

  1. Document the columns of photo_search()
  2. Parse the data from photo_search(), user_info(), and get_photoinfo() before returning it to the user
  3. Remove the default directory destination from download_images() and throw an error if the user doesn't supply one (this is one of the CRAN policies).
  4. Make sure the vignette documentation works, even if you are not going to run it on CRAN.

Stray observations:


Working through the README

The example in the README does not work out of the box:

library(photosearcher)

rock_climbing <- photo_search(
  mindate = "2010-01-01",
  maxdate = "2018-01-01",
  text = "rock climbing",
  bbox = "-12.875977,49.210420,2.636719,59.977005",
  has_geo = TRUE
)
#> Error in photo_search(mindate = "2010-01-01", maxdate = "2018-01-01", : argument 1 matches multiple formal arguments

Created on 2019-08-28 by the reprex package (v0.3.0)

I found that fixing mindate and maxdate to have _taken suffix, addressed the problem. When I ran the command with the suffixes, I was prompted to find my API key from flickr. I did not have one, so I used a throwaway account to generate one. I copy+pasted the key (not the secret) into to console as it prompted me to do so and the search for eight years of rock climbers in the UK and Ireland produced fruitful results, taking only a few seconds to search.

The example of plotting the geographic coordinates of all the rock climbers in that eight year period was helpful and it worked as expected.

Adapting the README example

I wanted to see if I could show my spouse that yes, snakes did exist in the UK despite the fact that we haven't seen one while hiking in the last year and was able to produce a map of all photos tagged with "snake" in the last year.

When I tried to color the points by number of comments, I got a categorical variable instead of a continuous because all the columns were encoded as character. I found that this affected the SF object because it could not interpret the character coordinates as a bounding box for the print method:

Simple feature collection with 148 features and 55 fields
geometry type:  POINT
dimension:      XY
bbox:           Error in signif(attr(x, "bbox"), options("digits")$digits) : 
  non-numeric argument to mathematical function

Things worked well after I used the following pipeline to convert relevant columns:

library('dplyr')
snakes_clean <- snakes %>%
  mutate_at(vars(starts_with("count")), as.integer) %>%
  mutate_at(vars(ends_with("itude")), as.numeric) %>%
  mutate_at(vars(starts_with("height"), starts_with("width")), as.numeric) %>%
  mutate_at(vars(starts_with("date")), readr::parse_datetime)

Walk through the package vignette

The code from the vingette did not work out of the box for the following issues:

  1. mindate/maxdate needed the _taken suffix
  2. a comma was needed after tags = "lake"
  3. no one took any pictures of a lake with the text "mountain" on New Year's 2019
  4. I could not test the location services because they were down for flickr
  5. the function related_terms() was renamed to related_tags()

I did not attempt the search by layer because I did not have an SF layer on hand. I would recommend to make the layer you have in the tests available to the users through something like system.file() so that they have a chance to run it if they don't have a convenient layer to work through the example.

I think the download_images() demonstrataion needs to be modified a bit. If the user naively walks through this and copy+pasting code, they will end up downloading over a gigabyte of photos to a directory called "Fox images" situated in their current working directory.

I would suggest the following change:

# Downloading the first ten fox images from photosearch
temp_fox_dir <- dir.create(tempdir(), "foxes")
download_images(photo_id = foxes$id, saveDir = temp_fox_dir)

# Download a specific images based off its id
download_images(photo_id = 47416147181, saveDir = temp_fox_dir)

# View the images in the directory
if (interactive()) browseURL(temp_fox_dir)

Important: The download_images() function explicitly creates a directory in the user's current working directory called "downloaded_images" or simply attempts to write into this directory if this exists. It would be better to throw an error if the user doesn't specify a directory.

The related_tags() function works as expected

The example for get_photoinfo() appears to be duplicated.

sckott commented 5 years ago

thanks for your review @zkamvar !

monicagerber commented 5 years ago

Hi @nfox29 and @sckott. I am nearly done with the review for the package but need until tomorrow to finish up. I got caught up with the Flickr API key. I realized that the issue was that the in the example in the README, the argument names should be mindate_taken and maxdate_taken instead of mindate and maxdate. The API key functionality works for me.

sckott commented 5 years ago

thanks for the update

monicagerber commented 5 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 7


Review Comments

A statement of need

The package also supports a number of other Flickr API call methods including the flickr.tags.getRelated and flickr.places.tagsForPlace methods.

Vignettes

Main vignette:

Article:

Function documentation

data.frame. Output consists of 57 variables including; latitude and longitude of photograph, date and time it was taken, associated tags and image urls.

A data.frame with 57 columns including latitude and longitude of the photograph, the date and time it was taking, and the associated tags and image urls.

Examples

Community guidelines

Functionality

Performance

Packaging guidelines

Other comments

sckott commented 5 years ago

thanks very much for your review @monicagerber !

Dr-Nathan-Fox commented 5 years ago

@monicagerber @zkamvar thank you both for your reviews, they are extremely helpful. Apologises for the errors in the examples that was an oversight on my behalf. I will now work through your suggested changes and will reply once I have finished. Kindest regards, Nathan.

Dr-Nathan-Fox commented 5 years ago

Hi @sckott @monicagerber @zkamvar, again thank you all for your comments.

I have addressed all of the comments you made but the main highlights of things I have changed are:

The only comment that I haven't addressed is

I think it would be helpful to rename the functions in this package using the object_verb() naming scheme (see section 1.4: Function and argument naming).

Though I agree the object_verb() naming is a good convention, here I have elected to keep the same naming style as the Flickr API. I believe that this will facilitate the use for both current and new users of the API.

Please let me know of any additional changes I need to make or any extra steps I need to complete for the submission.

Kindest regards, Nathan

sckott commented 5 years ago

thanks very much for your response @nfox29

reviewers, @monicagerber and @zkamvar - are you happy with the maintainers response? Any other questions/suggestions?

sckott commented 5 years ago

reviewers, @monicagerber and @zkamvar - are you happy with the maintainers response? Any other questions/suggestions?

zkamvar commented 5 years ago

I am happy with the response. I think @nfox29 did a thorough job addressing our comments and it has improved the package experience.

I only have a couple of suggestions/questions for the documentation/behavior of photo_search():

  1. It's not clear that the user is expected to have a file called photosearcher_key.sysdata in their working directory, nor that it will be explicitly written to the directory when they enter their api key. It would be good to have this documented in the photo_searcher() function at the least.
  2. If possible, it would be preferred to take the "name" element of the license attribute instead of the "id" element since that gives a description of the license
  3. There are several variables in the table that are effectively logical, but are stored as integers (e.g. ispublic). I would recommend to store these as logical values so that it is explicit.
  4. Should the ID (id, woeid) elements be stored as characters instead of integers since they are identifiers?
sckott commented 5 years ago

thanks @zkamvar !!

@monicagerber you happy or further comments?

Dr-Nathan-Fox commented 5 years ago

Hi @zkamvar,

I have addressed all of your additional comments:

  1. I have clarified the API key storage and use in all documents not just for the one function.
  2. I have added additional two additional columns to the output: license name and the url of information on that license
  3. I have changed these to logical and have also changed all the dates so they are in the same format
  4. This has be fixed

@sckott @monicagerber is there any additional changes I need to make. I am currently in the process of finishing my manuscript for MEE and would like to send the manuscript off after the rOpenSci process has been completed.

Kindest regards, Nathan

zkamvar commented 5 years ago

LGTM. Thank you for addressing those, @nfox29!

sckott commented 5 years ago

Thanks @zkamvar - I'll take one last quick look ...

sckott commented 5 years ago

Approved (with some caveats, see below)! Thanks @nfox29 for submitting and @zkamvar & @monicagerber for your reviews!

caveats: on check I see:

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Dr-Nathan-Fox commented 4 years ago

Hi @sckott,

I have fixed the caveats and am working through the on boarding process. I am however unable to transfer the repo to rOpenSci's "ropensci" GitHub organization. I have attempted to transfer it under settings but i get the warning

you don’t have the permission to create public repositories on ropensci

Please can you advise, Kindest regards, Nathan

sckott commented 4 years ago

@nfox29 you should get an email invitation to a github team, once you accept that you should be able to do so