Open s3alfisc opened 2 years ago
Title: Request for Cluster-by-Cluster Information Similar to Stata's Default Output
Body:
I'm currently working on a paper recommending the use of summclust
for cluster analysis in clinical trials. While using your package, I've noticed a discrepancy between the cluster-by-cluster information output in R versus Stata. I would greatly appreciate assistance in aligning the R output with Stata's default presentation.
Using the following R code with summclust
on scrambled_cherish.csv:
# Set a wider display
options(width = 150)
# Read the data
data <- read.csv("scrambled_cherish.csv")
# Create dummy variables
data$treat <- ifelse(data$intervention_factor == "Intervention", 1,
ifelse(data$intervention_factor == "Control", 0, NA))
data$elect <- as.numeric(data$adm_cat == "Elective")
data$female <- as.numeric(data$gender == "Female")
data$cog_dec <- ifelse(data$cognitive_impairment_admission == "1", 1,
ifelse(data$cognitive_impairment_admission == "0", 0, NA))
data$fun_dec <- ifelse(data$hospital_associated_functional_decline == "1", 1,
ifelse(data$hospital_associated_functional_decline == "0", 0, NA))
# Create factor variables
data$hospital <- as.factor(data$hospital)
data$ward <- as.factor(data$ward)
# Load necessary libraries
library(sandwich)
library(summclust)
# Specify the model formula
mf <- formula(
delirium_new ~ treat +
age +
female +
cognitive_impairment_admission +
hospital_associated_functional_decline +
cci_unadjusted_for_age +
elect +
factor(hospital)
)
# Testing clusters
fit <- lm(mf, data = data)
summ <- summclust(fit, params = ~ treat, cluster = ~ ward)
summary(summ)
# Create cluster by cluster information
cluster_by_cluster <- data.frame(
Cluster = seq_along(summ$N_G),
Ng = as.vector(summ$N_G),
Leverage = summ$leverage_g,
Partial_L = summ$partial_leverage[1,],
Beta_no_g = summ$beta_jack[1,]
)
print(cluster_by_cluster, row.names = FALSE)
I get this output:
Cluster Ng Leverage Partial_L Beta_no_g
1 65 1.5615371 0.13941055 -0.030189263
2 37 1.2127338 0.10579421 -0.040323642
3 38 1.0415643 0.09938913 0.021841058
4 39 1.8487431 0.16080947 0.025654521
5 59 1.0121208 0.09115682 -0.001107840
6 65 0.9095753 0.09388891 -0.004918237
7 44 1.7757687 0.16309955 0.005450797
8 60 1.6379568 0.14645136 0.008303581
When running a similar analysis in Stata, I get:
ward_cat | Ng Leverage Partial L. beta no g
-----------+-----------------------------------------------------
1 | 65 1.775769 0.136151 0.005451
2 | 37 1.012121 0.070870 -0.001108
3 | 38 0.909575 0.083975 -0.004918
4 | 39 1.041564 0.114481 0.021841
5 | 59 1.561537 0.144157 -0.030189
6 | 65 1.848743 0.178644 0.025655
7 | 44 1.212734 0.124528 -0.040324
8 | 60 1.637957 0.147194 0.008304
The values are correct, but the order differs. For example, cluster 1's leverage in Stata is 1.775769, while in R it's 1.5615371 (which corresponds to cluster 5 in Stata).
Could you please provide guidance or adjust the summclust
package to allow extraction of cluster-by-cluster information in an order consistent with Stata's output? This would greatly enhance the package's utility for researchers transitioning between R and Stata or comparing results across platforms.
Thank you for your attention to this matter. Your assistance will significantly contribute to the adoption of summclust
in clinical trial analysis.
// Saving output
log using Mudge_output.log, replace text
// Retrieving and preparing data
import delimited "scrambled_cherish.csv"
// Create the dummy variable for treated
gen treat = .
replace treat = 1 if intervention_factor == "Intervention"
replace treat = 0 if intervention_factor == "Control"
// Create the dummy variable for elective admissions
gen elect = (adm_cat == "Elective")
// Female dummy
gen female = (gender == "Female")
// Cognitive decline dummy
gen cog_dec = .
replace cog_dec = 1 if cognitive_impairment_admission == "1"
replace cog_dec = 0 if cognitive_impairment_admission == "0"
// Functional decline dummy
gen fun_dec = .
replace fun_dec = 1 if hospital_associated_functional_d == "1"
replace fun_dec = 0 if hospital_associated_functional_d == "0"
// Hospital categories
encode hospital, generate(hospital_cat)
// Ward categories
encode ward, generate(ward_cat)
// Testing clusters
summclust ///
delirium_new /// primary outcome
treat /// treatment indicator
age ///
female ///
cog_dec ///
fun_dec ///
cci_unadjusted_for_age ///
elect ///
, fevar(hospital_cat) ///
cluster(ward_cat) ///
table addmeans jack rho(0.5) regtable // choice of tests
Hi @SergeyVAlexeev, thanks for raising this - I'm travelling at the moment so haven't really had time to look into it; I'll do so once I'm back home by the end of the week!
For