pglpm / bayes_nonparametric_inference

Software development for "Bayesian nonparametric population inference". In other words, just the direct application of probability theory to get the most general, principled, model-free inference we can have.
Creative Commons Zero v1.0 Universal
4 stars 1 forks source link

"source" is NOT "import" #21

Closed pglpm closed 3 weeks ago

pglpm commented 1 month ago

Important: all source commands in a function must stay within that function. This is because source is not a sort of "import".

choisant commented 4 weeks ago

So I did some testing, and I am not sure about this. Here is my test: In my working directory I have a file, tests.R with this content:

test_source <- function(nested = TRUE) {
  nested_function <- function() {
    source("afunction.R")
    a_function()
  }
  if (!nested) {
    source("afunction.R")
    a_function()
  } else {
    nested_function()
  }
}

I also have a file "afunction.R" with this content:

cat("Sourcing afunction.R \n")
a_function <- function() {
    cat("I am sourced! \n")
}

My R session:

> source("tests.R")
> a_function()
Error in a_function() : could not find function "a_function"
> test_source(nested = TRUE)
Sourcing afunction.R 
I am sourced! 
> a_function()
I am sourced! 

Then I try to not use the nested function:

> source("tests.R")
> test_source(nested=FALSE)
Defining nested function: 
Sourcing: 
Sourcing afunction.R 
Running function from test_source: 
I am sourced! 
> a_function()
I am sourced! 

So either way, the contents of "afunction.R" is imported into my active session as soon as the source() function is used. It does not stay "within the scope" of its function.

pglpm commented 4 weeks ago

@choisant Found the catch: source([file], local=TRUE)

choisant commented 4 weeks ago

Philosophical thoughts on the topic of where in the file/function to place source():

What do we wish to gain? Reduce the number of times source() is called unnecessarily? Make the code easier to understand? Developers/users expect dependencies to be easy to see at the top of the document. Having code which relies on you placing source() at a specific linear point in the code might mean that the code is poorly written (overlapping function/variable names?). If however, a function is called seldom, or just in some cases, a small amount of computational speed might be gained from placing source() inside that function. But if a function is called many times, then we are source()-ing more than we have to, which will probably increase computational time.

pglpm commented 3 weeks ago

The problem will be solved by defining in the main workspace all functions used by several other functions. So no source() will be necessary.

source() may be used, though, to relegate to a separate file a particular group of lines – for example lines that assign parameter values, or lines that define a function used only in the present file. This is done just for ease of editing, those lines could just be left in the original file. In this use case, to me it makes sense to leave source() exactly in the place where it is called, because it isn't really a "dependency".