ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

The Database of Odorant Responses (DoOR.functions) #34

Closed Dahaniel closed 6 years ago

Dahaniel commented 8 years ago

Package containing the functions for the DoOR project. DoOR provides a framework to merge heterogeneous response data into a single consensus database, thus creating a comprehensive view on olfactory coding.

Package: DoOR.functions
Type: Package
Title: Integrating Heterogeneous Odorant Response Data into a Common Response
    Model: A DoOR to the Complete Olfactome
Version: 2.0.0
Date: 2016-02-25
Authors@R: c(
    person("Daniel", "Münch", email = "daniel.muench@uni-konstanz.de", role = c("aut", "cre")),
    person("C. Giovanni", "Galizia", email = "giovanni.galizia@uni-konstanz.de", role = "aut"),
    person("Shouwen", "Ma", role = "aut"),
    person("Martin", "Strauch", role = "aut"),
    person("Anja", "Nissler", role = "aut"),
    person("Wolf", "Huetteroth", role = "ctb"))
URL: http://neuro.uni.kn/DoOR
BugReports: https://github.com/Dahaniel/DoOR.data/issues
Description: This is a function package providing functions to perform data
    manipulations and visualizations for DoOR.data. See http://dx.doi.org/10.1093/chemse/bjq042 for the original publication and http://dx.doi.org/10.1038/srep21841
    regarding version 2.0.
License: GPL-3
Encoding: UTF-8
Imports:
    utils
Depends:
    DoOR.data (>= 2.0.0),
    R (>= 3.2.0)
Remotes: Dahaniel/DoOR.data@v2.0.0
Suggests:
    ggplot2,
    grid,
    gridExtra (>= 2.0.0),
    knitr,
    class,
    rmarkdown
VignetteBuilder: knitr
RoxygenNote: 5.0.1

https://github.com/Dahaniel/DoOR.functions

It provides its own data via the DoOR.data package (see corresponding onboarding issue).

Scientists working on olfactory coding, modelers and physiologists.

No, to our knowledge DoOR is the only project providing this kind of data and using this approach to integrate heterogeneous data.

sckott commented 8 years ago

Thanks for your submission, seeking reviewers

noamross commented 8 years ago

Reviewers: @RemkoDuursma Due date: 2016-05-09

sckott commented 8 years ago

@RemkoDuursma Due date: 2016-05-09 - hey there, it's been 16 days, please get your review in by May 08, thanks :smiley_cat: (ropensci-bot)

sckott commented 8 years ago

@RemkoDuursma Due date: 2016-05-09 - hey there, it's been 21 days, please get your review in soon, thanks :smiley_cat: (ropensci-bot)

noamross commented 8 years ago

Hi @RemkoDuursma, just checking in on this review, which was due 2016-05-09. Please get it in soon, thanks!

sckott commented 8 years ago

@RemkoDuursma Due date: 2016-05-09 - hey there, it's been 36 days, please get your review in soon, thanks :smiley_cat: (ropensci-bot)

RemkoDuursma commented 8 years ago

Apologies for my very slow review.

sckott commented 8 years ago

@Dahaniel can you respond to the review above please

Dahaniel commented 8 years ago

@RemkoDuursma Thank you very much for reviewing the package and apologies for my slow response.

We also use default.val to check if the data.frames a function relies on exist in the global environment and if not load the data from the package:

  if (DoOR_default == "DoOR.mappings") {
    if (!exists("DoOR.mappings")) {
      data(DoOR.mappings)
    }
    return(DoOR.mappings)
  } 

The idea is that the user can put modified data into the global environment that will be used when available. Not sure whether this could be implemented with options.

sckott commented 8 years ago

thanks for your response @Dahaniel on function names, see comment in #35

Dahaniel commented 7 years ago

Hi everyone, apologies that it took me so long to implement the requested changes. But finally today I was able to release v2.0.1 for both packages, implementing all of the above discussed changes. In detail (I summarize the changes here as most affect both packages):

sckott commented 7 years ago

thx @Dahaniel

@RemkoDuursma if possible, could you take a look at the changes for this and #35 and let us know if you're happy with the changes? also let us know if you're too busy/etc, and we'll have a look instead

RemkoDuursma commented 7 years ago

Sorry I won't have a chance any time soon to look at the changes. If you could have a look, that would be great.

noamross commented 7 years ago

Thanks for the update, @Dahaniel. As @RemkoDuursma is no longer available, we'll have an alternate reviewer or editor look at your changes.

sckott commented 7 years ago

@Dahaniel

Some of R check output

N  checking R code for possible problems
   default.val: no visible binding for global variable ‘door_AL_map’
   default.val: no visible binding for global variable ‘door_data_format’
   default.val: no visible binding for global variable ‘odor’
   default.val: no visible binding for global variable ‘odor.dist’
   default.val: no visible binding for global variable
     ‘door_global_normalization_weights’
   default.val: no visible binding for global variable
     ‘door_excluded_data’
   default.val: no visible binding for global variable ‘door_mappings’
   default.val: no visible binding for global variable ‘ORs’
   default.val: no visible binding for global variable
     ‘door_response_matrix’
   default.val: no visible binding for global variable
     ‘door_response_range’
   default.val: no visible binding for global variable
     ‘door_response_matrix_non_normalized’
   door_default_values: no visible binding for global variable
     ‘door_AL_map’
   door_default_values: no visible binding for global variable
     ‘door_data_format’
   door_default_values: no visible binding for global variable ‘odor’
   door_default_values: no visible binding for global variable ‘odor.dist’
   door_default_values: no visible binding for global variable
     ‘door_global_normalization_weights’
   door_default_values: no visible binding for global variable
     ‘door_excluded_data’
   door_default_values: no visible binding for global variable
     ‘door_mappings’
   door_default_values: no visible binding for global variable ‘ORs’
   door_default_values: no visible binding for global variable
     ‘door_response_matrix’
   door_default_values: no visible binding for global variable
     ‘door_response_range’
   door_default_values: no visible binding for global variable
     ‘door_response_matrix_non_normalized’
   dplot_ALmap: no visible binding for global variable ‘odor’
   dplot_ALmap: no visible binding for global variable ‘x’
   dplot_ALmap: no visible binding for global variable ‘group’
   dplot_ALmap: no visible binding for global variable ‘glomerulus’
   dplot_ALmap: no visible binding for global variable ‘response’
   dplot_acrossOSNs: no visible binding for global variable ‘sensillum’
   dplot_acrossOSNs: no visible binding for global variable ‘OSN’
   dplot_acrossOSNs: no visible binding for global variable ‘value’
   dplot_acrossOSNs: no visible binding for global variable ‘dataset’
   dplot_acrossOSNs: no visible binding for global variable ‘odorant’
   dplot_acrossReceptors: no visible binding for global variable ‘value’
   dplot_across_osns: no visible binding for global variable ‘sensillum’
   dplot_across_osns: no visible binding for global variable ‘OSN’
   dplot_across_osns: no visible binding for global variable ‘value’
   dplot_across_osns: no visible binding for global variable ‘dataset’
   dplot_across_osns: no visible binding for global variable ‘odorant’
   dplot_across_ru: no visible binding for global variable ‘value’
   dplot_al_map: no visible binding for global variable ‘odor’
   dplot_al_map: no visible binding for global variable ‘x’
   dplot_al_map: no visible binding for global variable ‘group’
   dplot_al_map: no visible binding for global variable ‘glomerulus’
   dplot_al_map: no visible binding for global variable ‘response’
   dplot_compareProfiles: no visible binding for global variable ‘odorant’
   dplot_compareProfiles: no visible binding for global variable ‘value’
   dplot_compareProfiles: no visible binding for global variable ‘dataset’
   dplot_compare_profiles: no visible binding for global variable
     ‘odorant’
   dplot_compare_profiles: no visible binding for global variable ‘value’
   dplot_compare_profiles: no visible binding for global variable
     ‘dataset’
   dplot_responseMatrix: no visible binding for global variable ‘odorant’
   dplot_responseMatrix: no visible binding for global variable ‘dataset’
   dplot_responseMatrix: no visible binding for global variable ‘value’
   dplot_responseProfile: no visible binding for global variable ‘odorant’
   dplot_responseProfile: no visible binding for global variable ‘value’
   dplot_response_matrix: no visible binding for global variable ‘odorant’
   dplot_response_matrix: no visible binding for global variable ‘dataset’
   dplot_response_matrix: no visible binding for global variable ‘value’
   dplot_response_profile: no visible binding for global variable
     ‘odorant’
   dplot_response_profile: no visible binding for global variable ‘value’
   dplot_tuningCurve: no visible binding for global variable ‘odorants’
   dplot_tuningCurve: no visible binding for global variable ‘value’
   dplot_tuningCurve: no visible binding for global variable ‘receptors’
   get_dataset: no visible binding for global variable ‘door_data_format’
   identifySensillum: no visible binding for global variable ‘sensillum’
   identifySensillum: no visible binding for global variable ‘dataset’
   identifySensillum: no visible binding for global variable ‘odorant’
   identifySensillum: no visible binding for global variable ‘value’
   identify_sensillum: no visible binding for global variable ‘sensillum’
   identify_sensillum: no visible binding for global variable ‘dataset’
   identify_sensillum: no visible binding for global variable ‘odorant’
   identify_sensillum: no visible binding for global variable ‘value’
   importNewData: no visible binding for global variable
     ‘door_response_range’
   importNewData: no visible binding for global variable
     ‘door_global_normalization_weights’
   import_new_data: no visible binding for global variable
     ‘door_response_range’
   import_new_data: no visible binding for global variable
     ‘door_global_normalization_weights’
   load2list: no visible binding for global variable ‘ORs’
   privateOdorant: no visible binding for global variable ‘x’
   privateOdorant: no visible binding for global variable ‘n’
   private_odorant: no visible binding for global variable ‘x’
   private_odorant: no visible binding for global variable ‘n’
   rebuild_metadata: no visible binding for global variable ‘ORs’
   updateOdorInfo: no visible binding for global variable ‘ORs’
   updateOdorInfo: no visible binding for global variable ‘odor’
   update_door_odorinfo: no visible binding for global variable ‘ORs’
   update_door_odorinfo: no visible binding for global variable ‘odor’
   Undefined global functions or variables:
     ORs OSN dataset door_AL_map door_data_format door_excluded_data
     door_global_normalization_weights door_mappings door_response_matrix
     door_response_matrix_non_normalized door_response_range glomerulus
     group n odor odor.dist odorant odorants receptors response sensillum
     value x

   Found the following assignments to the global environment:
   File ‘DoOR.functions/R/create_door_database.R’:
     assign("door_response_matrix", frame_data, envir = .GlobalEnv)
     assign("door_response_matrix_non_normalized", frame_data_nn,
       envir = .GlobalEnv)
     assign("door_excluded_data", door_excluded_data, envir = .GlobalEnv)
   File ‘DoOR.functions/R/import_new_data.R’:
     assign(j, target, envir = .GlobalEnv)
     assign(j, target, envir = .GlobalEnv)
     assign(column.name, combine_data(data1 = target, data2 = imported.data,
       by.data2 = column.name, assigned.name = paste(file.name,
           sep = "")), envir = .GlobalEnv)
     assign("door_data_format", dataFormat, envir = .GlobalEnv)
     assign("door_global_normalization_weights", weightGlobNorm, envir = .GlobalEnv)
     assign("door_response_range", responseRange, envir = .GlobalEnv)
     assign("ORs", receptors, envir = .GlobalEnv)
     assign("odor", odor_data, envir = .GlobalEnv)
   File ‘DoOR.functions/R/rebuild_metadata.R’:
     assign("door_global_normalization_weights", wgn, envir = .GlobalEnv)
     assign("door_response_range", rr, envir = .GlobalEnv)
   File ‘DoOR.functions/R/remove_study.R’:
     assign(x, data, envir = .GlobalEnv)
     assign("door_response_range", responseRange, envir = .GlobalEnv)
     assign("door_global_normalization_weights", weightGlobNorm, envir = .GlobalEnv)
   File ‘DoOR.functions/R/update_door_database.R’:
     assign("door_response_matrix_non_normalized", response_matrix_nn,
       envir = .GlobalEnv)
     assign("door_response_matrix", response_matrix, envir = .GlobalEnv)
     assign("door_excluded_data", door_excluded_data, envir = .GlobalEnv)
   File ‘DoOR.functions/R/update_door_odorinfo.R’:
     assign(i, tmp, envir = .GlobalEnv)
     assign("door_data_format", odor[1:5], envir = .GlobalEnv)

   Found the following calls to data() loading into the global environment:
   File ‘DoOR.functions/R/door_default_values.R’:
     data(door_AL_map)
     data(door_data_format)
     data(odor)
     data(odor.dist)
     data(door_global_normalization_weights)
     data(door_excluded_data)
     data(door_mappings)
     data(ORs)
     data(door_response_matrix)
     data(door_response_range)
     data(door_response_matrix_non_normalized)
   See section ‘Good practice’ in ‘?data’.

many problems above:

Dahaniel commented 7 years ago

Thanks for the comments @sckott

  • Looks like you still don't have unit tests - please add those

That is right, as explained before I do not have a clue about unit testing and might need some help here. However I read Hadley's piece on testthat and think I could implement some simple tests like checking against creation of empty data frames etc.

  • You have a circularity problem in your dependencies. DoOR.data is in DEPENDS in this pkg, and DoOR.functions is in Suggests for the DoOR.data package - why is DoOR.data in Depends? there's very few cases in which that's needed, is it here?

True, in an earlier mail you suggested to put the data package into DEPENDS, I had it in SUGGESTS before. The reason being that DoOR.functions depends on some of the data provided in the DoOR.data package. This is also the reason for the undefined global functions or variables below.

  • R check reveals many problems: you need to Rbuildignore .travis.yml/CHANGELOG.md (but you should also have a NEWS.md or NEWS as that's more typical of R packages),

ok

  • don't assign variables to the global environment
  • see ‘Good practice’ in ‘?data’ for help on data objects

As discussed in https://github.com/ropensci/onboarding/issues/35#issuecomment-218391691 this is how the package was initially written: "Load data into the global environment so that it is easily visible and edible for the user". I am looking into ways to rewrite that (started a branch for that). But this is not simple as I might need to write lots of helper functions for the user to easily interact with the data.

As described, my workaround for now was to create a user interaction upon package loading. I prompt a warning message and the user has to explicitly type "yes" to load the data.

  • you have many Undefined global functions or variables

These are from the DoOR.data package and that is why I put it to DEPENDS. Would putting it to SUGGESTS and the using DoOR.data:: work? But then if e.g. a user modified version of door_response_matrix exists in the global environment, the function should use this version and not pick the one from the DoOR.data package, which it would using the package:: annotation, right?

  • A LOT OF warnings about "no visible binding for global variable' - fix those

Many of the arise from using ggplot functions. I just learned about:

## quiets concerns of R CMD check re: the .'s that appear in pipelines
if(getRversion() >= "2.15.1")  utils::globalVariables(c("."))

here https://github.com/STAT545-UBC/Discussion/issues/451

summary

Points 1,3 & 7 I think I can fix. I would need some advice to fix the circularity in 2/6. For point 4/5 I do not see an easy fix for now and I hoped the warning and explicit user interaction will be a sufficient workaround for now.

sckott commented 7 years ago

Let us know if you have testing questions, we do want them before accepting this pkg

I'll take a look at your dependency issue and global variables

I just learned about: ... utils::globalVariables

right, that should work

Dahaniel commented 7 years ago

Thanks!

sckott commented 7 years ago

was checking, but it looks like some changes i asked for above are not done yet (e.g., .travis.yml should be put in .Rbuildignore) - i'll hold off until then

Dahaniel commented 7 years ago

Yes, I wrote some tests for DoOR.data and will start to work on DoOR.functions the next days. Just moved to a new lab and need to organize/schedule several things right now.

sckott commented 7 years ago

@Dahaniel okay, let me know when you're done

sckott commented 7 years ago

@Dahaniel anything to report on this?

Dahaniel commented 7 years ago

@sckott I implemented unit testing in the data package, for the functions package I scheduled implementation for 1st half of August.

sckott commented 7 years ago

Okay, thanks much

sckott commented 7 years ago

just checking in @Dahaniel - i'm waiting on this correct:

for the functions package I scheduled implementation for 1st half of August.

Dahaniel commented 7 years ago

@sckott quick update: I am working on this but it'll take a few more days.

Dahaniel commented 7 years ago

Ok, should be done.

I fixed:

there remains:

sckott commented 7 years ago

@Dahaniel

For the data issue: isn't it possible to do this approach:

this should get around the problem of the package loading data into the global namespace

i think we do need to solve this issue as CRAN will likely not accept it as is


I think the circularity thing is actually okay - as long as one has the other in Suggests

sckott commented 7 years ago

There's a few more things to fix. These are all quite important to fix.

GP DoOR.functions ─────

It is good practice to

  ✖ omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/back_project.R:53:1
    R/back_project.R:56:1
    R/back_project.R:65:1
    R/back_project.R:68:1
    R/back_project.R:69:1
    ... and 434 more lines

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/calculate_model.R:48:19
    R/get_dataset.R:25:15
    R/get_dataset.R:26:22
    R/get_responses.R:36:36
    R/import_new_data.R:94:29
    ... and 12 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/count_studies.R:31:13
    R/door_melt.R:29:13
    R/dplot_tuningCurve.R:82:35
    R/dplot_tuningCurve.R:110:36
    R/get_responses.R:33:13
    ... and 19 more lines

  ✖ avoid 'T' and 'F', as they are just variables which are
    set to the logicals 'TRUE' and 'FALSE' by default, but are not
    reserved words and hence can be overwritten by the user.  Hence,
    one should always use 'TRUE' and 'FALSE' for the logicals.

    R/back_project.R:NA:NA
    R/get_dataset.R:NA:NA
    R/map_receptor.R:NA:NA
    R/rebuild_metadata.R:NA:NA
Dahaniel commented 7 years ago

Thanks @sckott , how can I reproduce the good practice output you posted here?

sckott commented 7 years ago

install devtools::install_github("MangoTheCat/goodpractice") then run .Sys.setenv(NOT_CRAN = "true"); x = goodpractice::gp(); x

sckott commented 6 years ago

any thoughts @Dahaniel ?

Dahaniel commented 6 years ago

Hi @sckott , sorry, busy as usual. I actually startet working on fixing the "good practice" warnings and am done with the data package. Regarding the data issue: What makes it complicated is that it is actually not a single data.frame which is accessed but many and that there are many functions that need to be updated. So I think this will take some effort for writing as well as testing. If ok for you I would rather fix the GP issues now for the onboarding and then have CRAN compatibility as a new milestone afterwards.

sckott commented 6 years ago

@Dahaniel Okay, did you get to fixing these goodpractice problems yet?

Dahaniel commented 6 years ago

@sckott Should be fixed for both packages, shortened code-lines, replaced sapply with vapply etc...

sckott commented 6 years ago

@Dahaniel thanks, having a look

sckott commented 6 years ago

Sorry for the very long wait @Dahaniel !

Approved! Thanks again for your submission

[![](https://badges.ropensci.org/34_status.svg)](https://github.com/ropensci/onboarding/issues/34)

Dahaniel commented 6 years ago

That's awesome news! I'll see to your points in the coming days and I am happy to write a blog-post as well.

Dahaniel commented 6 years ago

@sckott where should I look for the invitation to the ROpenSci group? Could not find anything on github or in my mails.

sckott commented 6 years ago

sorry about that, you should see a notification now or soon

stefaniebutland commented 6 years ago

Hello @Dahaniel. Great to hear you're interested in writing a blog post about this. Here are some technical and editorial notes for contributing a post: https://github.com/ropensci/roweb2#contributing-a-blog-post

We ask that you submit a draft via pull request at least one week prior to the planned publication date. I have two dates open right now. Take your pick: 1) Tues Feb 20 (draft due Wed Feb 14 if possible) 2) Tues Mar 20 (draft due Tues Mar 13)

Dahaniel commented 6 years ago

Thanks @stefaniebutland, I pick March 20th then!

stefaniebutland commented 6 years ago

Let me know if I can help along the way

Dahaniel commented 6 years ago

@stefaniebutland is there a guideline or a specific structure that I should follow when writing the blog post?

stefaniebutland commented 6 years ago

@Dahaniel No specific structure. The best approach might be to read a few other blog posts about onboarded packages and see which ones you like / which ones have a narrative that grabs you. This blog tag gets you to some posts: https://ropensci.org/tags/review/

Some basic suggestions here: https://github.com/ropensci/roweb2#blog-post-editorial-suggestions

stefaniebutland commented 6 years ago

If it would help, I could give you feedback on an outline where you identify audience, take home message and key points you want to get across. No obligation to do this