kkawato / rdlearn

Safe Policy Learning under Regression Discontinuity Designs with Multiple Cutoffs
Other
0 stars 0 forks source link

Apply minor edits and one refactoring to increase readability. #7

Closed sou412 closed 1 day ago

sou412 commented 4 months ago

Refactoring a complex apply logic in 1-4-0safelearn.R

In the script, there's a very complex logic: https://github.com/kkawato/rdlearn/blob/70e9ae114bcfd07cb28b30804dcf6c5da9a52d47/R/1-4-0safelearn.R#L88-L105

It is not immediately clear what's being computed here. I proposed the following refactoring:

data_all[eval_cond, paste0("d", d)] <-
  apply(temp_df, 1, function(x) {
    extrapolation(
      x.train = x[1],
      treat = d,
      g = g,
      g.pr = x[2] + (1 - d),
      Lip_1 = Lip_1,
      Lip_0 = Lip_0,
      dif_1 = dif_1,
      dif_0 = dif_0,
      G = G,
      C = C
    )$lower[[1]]
  })

Idea behind the proposed change

Test

I tested the proposed change with the test code in testthat directory.

Browse[3]> 
debug at /workspaces/rdlearn/R/1-4-0safelearn.R#88: data_all[eval_cond, paste0("d", d)] <- apply(temp_df, 1, function(x) {
    sum(unlist(sapply(x[2] + (1 - d), function(g.temp) {
        extrapolation(x.train = x[1], treat = d, g = g, g.pr = g.temp, 
            Lip_1 = Lip_1, Lip_0 = Lip_0, dif_1 = dif_1, dif_0 = dif_0, 
            G = G, C = C)
    })[2, ]))
})
Browse[3]> apply(temp_df, 1, function(x) {
  sum(unlist(
    sapply(x[2] + (1 - d), function(g.temp) {
      extrapolation(
        x.train = x[1],
        treat = d,
        g = g,
        g.pr = g.temp,
        Lip_1 = Lip_1,
        Lip_0 = Lip_0,
        dif_1 = dif_1,
        dif_0 = dif_0,
        G = G,
        C = C
      )
    })[2, ]
  ))
})
 [1] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
 [7] -1.0000000 -1.0000000 -1.0000000 -0.5463118  0.2643316  0.2820207
[13] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[19] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[25] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -0.5333738
[31] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[37] -1.0000000 -1.0000000 -1.0000000 -0.1553511
Browse[3]> apply(temp_df, 1, function(x) {
  extrapolation(
    x.train = x[1],
    treat = d,
    g = g,
    g.pr = x[2] + (1 - d),
    Lip_1 = Lip_1,
    Lip_0 = Lip_0,
    dif_1 = dif_1,
    dif_0 = dif_0,
    G = G,
    C = C
  )$lower[[1]] 
})
 [1] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
 [7] -1.0000000 -1.0000000 -1.0000000 -0.5463118  0.2643316  0.2820207
[13] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[19] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[25] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -0.5333738
[31] -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000 -1.0000000
[37] -1.0000000 -1.0000000 -1.0000000 -0.1553511

Minor edits

The output from cat() will be

Cross fitting for fold 1
Cross fitting for fold 2
Cross fitting for fold 3
Cross fitting for fold 4
Cross fitting for fold 5

while the use of print() introduces [1] at the begining of each line:

[1] "Estimating dif and Lip for d = 1"
sou412 commented 4 months ago

@kkawato Please test the proposed change with a simulated / real data, before merging. Please make sure there are no diffs in the output.

sou412 commented 4 months ago

One thing I'm not quite following is the use of map in here https://github.com/kkawato/rdlearn/blob/70e9ae114bcfd07cb28b30804dcf6c5da9a52d47/R/1-4-1extrapolation.R#L43-L44

If I understand correctly, this function is applied to a scalar value (i.e., x.train is a single number when the function is called in safelearn().

Do you need to use map here? It seems to be throwing warning possibly due to this reason. Also, upper and lower object becomes a list, which add a bit of overhead when parsing the results in safelearn()?

sou412 commented 4 months ago

Adding @carolyyzz as a reviewer too as I assume some parts of the code come from @carolyyzz as well.

kkawato commented 3 months ago

@kkawato Please test the proposed change with a simulated / real data, before merging. Please make sure there are no diffs in the output.

It seems that the output is different... I am trying to detect the reason

sou412 commented 3 months ago

It seems that the output is different... I am trying to detect the reason

Oh that's not good -- sorry for making more work for you.

Do you think you could add (a code to generate) a simulated data so that we can debug easily? I can look at the diff in this branch while you investigate the diff from the paper on the main branch.

kkawato commented 2 months ago

I apologize for the delayed response. I replaced map function with apply in the extrapolation function in 1-4-1extrapolation.R. Also, I modified logic in 1-4-0safelearn.R as follows and replicate the result:

            data_all[eval_cond, paste0("d", d)] <-
              apply(temp_df, 1, function(x) {
                temp_result <- extrapolation(
                  x.train = x[1],
                  treat = d,
                  g = g,
                  g.pr = x[2] + (1 - d),
                  Lip_1 = Lip_1,
                  Lip_0 = Lip_0,
                  dif_1 = dif_1,
                  dif_0 = dif_0,
                  G = G,
                  C = C
                )
                sum(temp_result)
              })
sou412 commented 1 month ago

@kkawato Thanks for the update! It seems now this branch has a conflict. Should I just follow the main branch?

sou412 commented 1 month ago

Reopen the branch (closed by accident).

kkawato commented 1 month ago

Yes! thank you

sou412 commented 3 days ago

@kkawato Sorry it's been a while -- I resolved the conflict. Please take a look if it's good to merge. Thank you!