wahani / modules

Modules in R
https://cran.r-project.org/package=modules
Other
81 stars 3 forks source link

Adding functions from separate scripts in a module #28

Closed wleoncio closed 4 years ago

wleoncio commented 4 years ago

Summary

Details

I am currently developing an R package in which I'd like to group functions into modules. Each function is defined in its own .R file inside the R/ folder. To illustrate, say this is the basic structure of the package:

package
package/R
package/R/foo.R
package/R/foobar.R
package/R/modules.R
[...]

And the content of the files is as follows (plus Roxygen documentation):

foo.R

foo <- function() "foo"

bar.R

foobar <- function() paste(foo(), "bar") # notice it's dependency on foo()

modules.R

m <- module({
    export("foo", "foobar")
    expose("R/foo.R")
    expose("R/foobar.R")
})

I don't know if this is the proper way to declare m, since I understand expose() is expecting a script containing a function nested within a module, but that's what I did on a previous project and it worked fine.

The problem is that this time the fact that there's interdependence between my functions, I get scope errors I can't resolve.

What I've tried

What I'm avoiding

I imagine that I could bunch up foo() and foobar() under one R file and maybe wrap them around a module and that would solve the issue, but in my real case I have more than a dozen long functions on each module, so putting all functions together could very well get me to debugging and versiol control hell.

Intended usage

FWIW, this is how I intend users to use the functions within the package:

m$foo()
m$foobar()
?foo
?foobar
?m

Although it would be even better if one could load a module and then use its functions, i.e.:

load(m) # import()? use()?
foo()
foobar()

I don't know if the modules package does this, though.

wahani commented 4 years ago

Hi, and thanks for bringing this up. This is mostly how I personally use the package, so I can outline some strategies I apply myself. A bit off-topic, but still relevant is a PR I have open since last year. The idea would be to allow the developer to create subfolders in ./R/m and use this as a module. There are a lot of intriguing questions how the semantics of such a system would work. But it would fit your use-case quite nicely: you can simply move foo.R and foobar.R into the folder ./R/m and end up with a module called m in the package scope. This would automate what you are trying above. There are so many open questions around this, that I do not see much potential to integrate it any time soon, but I thought it may be interesting for. Not well documented, but ready for testing: https://github.com/wahani/modules/pull/12

You may test this with:

remotes::install_github("wahani/modules", "package-support")
# and then
modules::useModules() # in your package folder

Anyways, to get to a solution in the current version. I rarely use a module as user-interface because it is easier to document single functions. If you I expose a module, I wrap them in a constructor function, so that documentation is no issue. In that case we more or less have objects as in class based object orientation. This is of course a rather advanced usage scenario.

If you want to expose a module directly, I would add the documentation to the module and would not export the functions at all. Not sure if possible, but Rs documentation system allows for an alias which may allow you to link back a function to the module documentation (I am absolutely not sure about this). When you export the module, you have access to the member like you described above:

library(mypkg)
m$foo()
m$foobar()

If you want to make the member functions available in a session, you may use one of the following patterns:

# (1)
attach(m)
foo()
bar()
# (2)
modules::use(m, attach = TRUE)
foo()
bar()
# (3)
with(m, {
    foo()
    bar()
})

My personal style guide tells me, that functions with more then 10 lines are not acceptable, so my advice her would be to refactor the code into smaller pieces first. As single, very long function or split into many more, I would combine members from two files by applying the following pattern:

m-foo.R

m <- modules::module({
    export("foo")
    foo <- function() "foo"
})

m-foobar.R

m <- modules::extend(m, {
    export("foobar")
    foobar <- function() paste(foo(), "bar")
})

Note that the order matters. R parses the files of a package sequentially. You can order them in the collate field in the description. There is also a roxygen2 tag to do this for you; I can't remember which one though.

wleoncio commented 4 years ago

Thank you for the quick feedback! The version on #12 work perfectly indeed. I forgot to mention that I also tried the subfolder thing on 0.8.0, but it only got me as far as calling the functions like m$foo$foo() because the filename was treated as a list element (as expected), and I couldn't properly unlist() my way out of that situation.

I would love to see #12 merged and published, is there anything I can do to help? I believe this is a feature that would help me better structure both LOGAN (published and listed as a reverse-import of modules) and the package I'm currently developing. I am confident in my R packaging skills, but unfortunately environment scopes are still something that confuse me.

Regarding the second solution, I am still trying to wrap my head around the concept of constructor functions. I read about it yesterday on the docs, do you mean it in the same sense (e.g. in the example for amodule()). I'm afraid refactoring code is something I really wish to avoid, since the functions were written by co-authors to accompany their already-published book. Anyway, that's the next thing I'll try, I'll report ASAP.

wahani commented 4 years ago

is there anything I can do to help?

In a first step it would be helpful, if you would use it. Especially with respect to documentation it is not clear how it would integrate with roxygen2. Understanding what we want to see would help to make a decision, what we need to implement. Any feedback and bugs you discover are of course helpful.

Regarding the second solution, I am still trying to wrap my head around the concept of constructor functions.

There is really not that much to it. One example: https://github.com/wahani/REPLesentR/blob/master/R/SlideDeck.R

In a package you can:

Module <- function(x) {
    modules::module({
        foo <- function() x
    })
}
Module("a")$foo() # -> "a"

When you run this outside of a package, then x cannot be found, inside a package this would work. Because this is confusing, I wrote amodule to get consistent behavior regardless of the environment. The main motivation was to pass down arguments to a module. But that is of course optional.

wleoncio commented 4 years ago

Thank you again for the help and detailed explanation! I'll check with my collaborators if there's interest in allocating time to implement these solutions on the next release of our packages (which also means helping you get #12 on your next CRAN release).

For the moment, I think the issue is properly addressed, documented and usable, at least as long as the package-support branch remains open and with the implementations from PR #12. I'll close this for now and reopen if necessary. Cheers!

wahani commented 4 years ago

Thanks for closing!