Report p-values for PLS bootstrapping

sem-in-r / seminr

Natural feeling domain-specific language for building structural equation models in R for estimation by covariance-based methods (like LISREL/Lavaan) or partial least squares (like SmartPLS)

58 stars 19 forks source link

Report p-values for PLS bootstrapping #141

Open soumyaray opened 4 years ago

soumyaray commented 4 years ago

[from user email request]

"I have a quick question about getting p-values for the structural paths in the bootstrap summary report. In the example shown in your SEMinR document, on the following items are shown: Original est., bootstrap mean, bootstrap SD, T stat, 2.5% CI, 97.5% CI. So, which is the p-value? Thank you for your explanation in advance."

soumyaray commented 4 years ago

There is some difficulty in pinning down an appropriate p-value, given that the bootstrapped parameters may not follow a perfect t-distribution.

However, if your data does not contain extreme values, we might assume near normality of parameter estimates. If so, we can compute the p-value from the t-value and degrees-of-freedom.

Using the mobi dataset (mobi) example:

mobi_pls <- estimate_pls(
  data = mobi, 
  measurement_model = mobi_mm, 
  structural_model = mobi_sm
)

pls_summary <- summary(mobi_pls)
pls_summary$paths

boot_seminr_model <- bootstrap_model(
  seminr_model = mobi_pls, 
  nboot = 1000, cores = 2, seed = NULL
)

boot_summary <- summary(boot_seminr_model)

# See full summary of all the paths
boot_summary$bootstrapped_paths

# gather paths and t-values
paths <- boot_summary$bootstrapped_paths[, "Original Est."]
tvalues <- boot_summary$bootstrapped_paths[, "T Stat."]

# degrees of freedom will be the number of rows in the data sample
df = nrow(mobi)

# calculate pvalues from tvalues and df; round to 3 decimal places
pvalues <- round( pt(tvalues, df, lower.tail = FALSE), 3)

# make a table of paths, tvalues, pvalues
data.frame(paths, tvalues, pvalues)

JulianGaviriaL commented 3 years ago

Dear @soumyaray,

Thank you very much for the suggested code. How could we obtain p-values for non-normal data?

Thanks in advance

soumyaray commented 3 years ago

Hi @JulianGaviriaL, even non-normal usually produces fairly normal looking statistics (e.g., the mean, path estimates, etc.). Take a look at the t-values from your bootstrap to see if they are very obviously non-symmetric:

(using tvalues from above calculation)

plot(hist(tvalues))

If the results look fairly symmetric, then simply apply the pvalues computation above. If not, let me know and I'll offer a stopgap solution.

In the coming motnhs, we will be looking into these issues in greater detail to offer more definitive solutions. (this is why we're keeping this issue open -- we're hoping to get as many use cases, questions and comments as we can – so thank you again for asking!)

JulianGaviriaL commented 3 years ago

Hi @soumyaray,

It's me who must thank you for your time!

Regarding the normal distribution of my dataset, it seems like it's not the case on my dataset:

data.xlsx

myDATA<-as.data.frame(myDATA)

# Measurement model;
measu_m <- constructs(
  composite("X", multi_items("Brain_prev", 1:2), weights = mode_B),
  composite("Y", multi_items("Brain_curr", 1:2), weights = mode_B),
  composite("M", multi_items("behavior", 1:2), weights = mode_B)
)

# Structural model: 
struct_m <- relationships( 
  paths(from = "X", to = "M"),
  paths(from = "M", to ="Y"), 
  paths(from = "X", to = "Y") 
)

# Model estimation
pls_m <- estimate_pls(data = myDATA, measurement_model = measu_m, structural_model = struct_m)
summary_pls_m <- summary(pls_m)

#bootstrap the model
boot_pls<-bootstrap_model(pls_m, nboot = 1000, cores = 4, seed = NULL)
boot_summary<-summary(boot_pls)

# See full summary of all the paths
boot_summary$bootstrapped_paths

# gather paths and t-values
paths <- boot_summary$bootstrapped_paths[, "Original Est."]
tvalues <- boot_summary$bootstrapped_paths[, "T Stat."]

plot(hist(tvalues))

Thanks in advance for your comments,

Best regards.

komari6 commented 1 year ago

@Sumidu have add some modifications for main code , because you are calculate P-Value for one tail. need to multiplied by 2 to get the two-tailed p-value. and df= degrees of freedom. where it equal to the number of observations minus the number of parameters that are estimated. For example, if you have a sample of 100 observations and you are estimating two parameters, then the degrees of freedom for your model would be 98. in otherhand if there 16 parameters estimated (4 independent variables with 4 questions each, plus 1 dependent variables with 4 questions each). so total 20 parameters. Therefore, the degrees of freedom would be

total observations - number of parameters = total observations - 20 = in my case so code be :

mobi_pls <- estimate_pls(
  data = mobi, 
  measurement_model = mobi_mm, 
  structural_model = mobi_sm
)

pls_summary <- summary(mobi_pls)
pls_summary$paths

boot_seminr_model <- bootstrap_model(
  seminr_model = mobi_pls, 
  nboot = 1000, cores = 2, seed = NULL
)

boot_summary <- summary(boot_seminr_model)

# See full summary of all the paths
boot_summary$bootstrapped_paths

# gather paths and t-values
paths <- boot_summary$bootstrapped_paths[, "Original Est."]
tvalues <- boot_summary$bootstrapped_paths[, "T Stat."]

# degrees of freedom will be the number of rows in the data sample - number of parameters
df = nrow(mobi) - "change it with number of parameters"

# calculate pvalues from tvalues and df; round to 3 decimal places. multiplied by 2 to get the two-tailed p-value
pvalues <- round( 2*pt(tvalues, df, lower.tail = FALSE), 3)

# make a table of paths, tvalues, pvalues
data.frame(paths, tvalues, pvalues)

hey0wing commented 9 months ago

I am completely new to this field, and I am looking for p-values to report from the paths too. I came across with this open issue about the p-value calculation, and have some unclear questions about the calculations.

In the current version, plot(boot_seminr_model) would include p-stars = TRUE as default. It is a strange thing because the only the T Stat. is available in the summary, where the p-value can be easily calculated.

Therefore, I looked into the code for the exact calculation from plot_dot.R line 781,

pvalue <- stats::pt(abs(t_value), nrow(model$data) - 1, lower.tail = FALSE)
# model is the bootstrapped model
# data is the from estimate_pls(data = data)

From my understanding, this is a one-tailed T-test with df = nrow(data)-1. At the meantime, I found a post on SmartPLS forum (link), which said df = bootstrap - 1. This approach also seems to make sense, because the number of bootstrap represent the sample size. In such case, the statistical power is positively correlated with the number of boot.

Please let me know if my thoughts are correct or not, and what is the correct way to calculate df. Many thanks!

singledoggy commented 4 months ago

I am completely new to this field, and I am looking for p-values to report from the paths too. I came across with this open issue about the p-value calculation, and have some unclear questions about the calculations.

In the current version, plot(boot_seminr_model) would include p-stars = TRUE as default. It is a strange thing because the only the T Stat. is available in the summary, where the p-value can be easily calculated.

Therefore, I looked into the code for the exact calculation from plot_dot.R line 781,
pvalue <- stats::pt(abs(t_value), nrow(model$data) - 1, lower.tail = FALSE)
# model is the bootstrapped model
# data is the from estimate_pls(data = data)
From my understanding, this is a one-tailed T-test with df = nrow(data)-1. At the meantime, I found a post on SmartPLS forum (link), which said df = bootstrap - 1. This approach also seems to make sense, because the number of bootstrap represent the sample size. In such case, the statistical power is positively correlated with the number of boot.

Please let me know if my thoughts are correct or not, and what is the correct way to calculate df. Many thanks!

Thanks for your comments here. We need abs() here otherwise the P value of the negative path will be wrong. I just copied the code above and it took me some time to debug.