Apply minor edits and one refactoring to increase readability.

sou412 commented 4 months ago

Refactoring a complex `apply` logic in `1-4-0safelearn.R`

In the script, there's a very complex logic: https://github.com/kkawato/rdlearn/blob/70e9ae114bcfd07cb28b30804dcf6c5da9a52d47/R/1-4-0safelearn.R#L88-L105

It is not immediately clear what's being computed here. I proposed the following refactoring:

data_all[eval_cond, paste0("d", d)] <-
  apply(temp_df, 1, function(x) {
    extrapolation(
      x.train = x[1],
      treat = d,
      g = g,
      g.pr = x[2] + (1 - d),
      Lip_1 = Lip_1,
      Lip_0 = Lip_0,
      dif_1 = dif_1,
      dif_0 = dif_0,
      G = G,
      C = C
    )$lower[[1]]
  })

Idea behind the proposed change

Looking at the code in the head, the most outer apply() is acting on temp_df, which is a n by 2 data frame: https://github.com/kkawato/rdlearn/blob/70e9ae114bcfd07cb28b30804dcf6c5da9a52d47/R/1-4-0safelearn.R#L85
Therefore, x passed to the (anonymous function function(x) { ... }) is a vector of two elements. This means that g.temp is same as x[2] + (1 - d).
[2,] part is parsing the second element from the output of extrapolation() function, which corresponds to the lower bound (lower object): https://github.com/kkawato/rdlearn/blob/70e9ae114bcfd07cb28b30804dcf6c5da9a52d47/R/1-4-1extrapolation.R#L46

Test

I tested the proposed change with the test code in testthat directory.

Base

Browse[3]> 
debug at /workspaces/rdlearn/R/1-4-0safelearn.R#88: data_all[eval_cond, paste0("d", d)] <- apply(temp_df, 1, function(x) {
    sum(unlist(sapply(x[2] + (1 - d), function(g.temp) {
        extrapolation(x.train = x[1], treat = d, g = g, g.pr = g.temp, 
            Lip_1 = Lip_1, Lip_0 = Lip_0, dif_1 = dif_1, dif_0 = dif_0, 
            G = G, C = C)
    })[2, ]))
})
Browse[3]> apply(temp_df, 1, function(x) {
  sum(unlist(
    sapply(x[2] + (1 - d), function(g.temp) {
      extrapolation(
        x.train = x[1],
        treat = d,
        g = g,
        g.pr = g.temp,
        Lip_1 = Lip_1,
        Lip_0 = Lip_0,
        dif_1 = dif_1,
        dif_0 = dif_0,
        G = G,
        C = C
      )
    })[2, ]
  ))
})
 [1] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
 [7] -1.0000000 -1.0000000 -1.0000000 -0.5463118  0.2643316  0.2820207
[13] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[19] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[25] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -0.5333738
[31] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[37] -1.0000000 -1.0000000 -1.0000000 -0.1553511

With the proposed change

Browse[3]> apply(temp_df, 1, function(x) {
  extrapolation(
    x.train = x[1],
    treat = d,
    g = g,
    g.pr = x[2] + (1 - d),
    Lip_1 = Lip_1,
    Lip_0 = Lip_0,
    dif_1 = dif_1,
    dif_0 = dif_0,
    G = G,
    C = C
  )$lower[[1]] 
})
 [1] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
 [7] -1.0000000 -1.0000000 -1.0000000 -0.5463118  0.2643316  0.2820207
[13] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[19] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[25] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -0.5333738
[31] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[37] -1.0000000 -1.0000000 -1.0000000 -0.1553511

Minor edits

Apply style_file() to 1-4-0safelearn.R script.
Fix some typos in roxygen blocks.
Replace if (x == TRUE) with if(isTRUE(x))
Replace print() with cat(). This has a benefit of clean(er) display of messages.

The output from cat() will be

Cross fitting for fold 1
Cross fitting for fold 2
Cross fitting for fold 3
Cross fitting for fold 4
Cross fitting for fold 5

while the use of print() introduces [1] at the begining of each line:

[1] "Estimating dif and Lip for d = 1"

sou412 commented 4 months ago

@kkawato Please test the proposed change with a simulated / real data, before merging. Please make sure there are no diffs in the output.

sou412 commented 4 months ago

One thing I'm not quite following is the use of map in here https://github.com/kkawato/rdlearn/blob/70e9ae114bcfd07cb28b30804dcf6c5da9a52d47/R/1-4-1extrapolation.R#L43-L44

If I understand correctly, this function is applied to a scalar value (i.e., x.train is a single number when the function is called in safelearn().

Do you need to use map here? It seems to be throwing warning possibly due to this reason. Also, upper and lower object becomes a list, which add a bit of overhead when parsing the results in safelearn()?

In the proposed change I'm doing $lower[[1]]
In the current version, there's a complex application of sapply + sum + unlist to deal with this?

sou412 commented 4 months ago

Adding @carolyyzz as a reviewer too as I assume some parts of the code come from @carolyyzz as well.

kkawato commented 3 months ago

@kkawato Please test the proposed change with a simulated / real data, before merging. Please make sure there are no diffs in the output.

It seems that the output is different... I am trying to detect the reason

sou412 commented 3 months ago

It seems that the output is different... I am trying to detect the reason

Oh that's not good -- sorry for making more work for you.

Do you think you could add (a code to generate) a simulated data so that we can debug easily? I can look at the diff in this branch while you investigate the diff from the paper on the main branch.

kkawato commented 2 months ago

I apologize for the delayed response. I replaced map function with apply in the extrapolation function in 1-4-1extrapolation.R. Also, I modified logic in 1-4-0safelearn.R as follows and replicate the result:

            data_all[eval_cond, paste0("d", d)] <-
              apply(temp_df, 1, function(x) {
                temp_result <- extrapolation(
                  x.train = x[1],
                  treat = d,
                  g = g,
                  g.pr = x[2] + (1 - d),
                  Lip_1 = Lip_1,
                  Lip_0 = Lip_0,
                  dif_1 = dif_1,
                  dif_0 = dif_0,
                  G = G,
                  C = C
                )
                sum(temp_result)
              })

sou412 commented 1 month ago

@kkawato Thanks for the update! It seems now this branch has a conflict. Should I just follow the main branch?

sou412 commented 1 month ago

Reopen the branch (closed by accident).

kkawato commented 1 month ago

Yes! thank you

sou412 commented 3 days ago

@kkawato Sorry it's been a while -- I resolved the conflict. Please take a look if it's good to merge. Thank you!

kkawato / rdlearn