Closed ixxmu closed 1 month ago
report 包是 easystats 生态系统中的一个核心包,用于简化统计模型结果的自动化报告和解释。它能够根据不同类型的模型生成简洁、可读的描述,方便用户理解和呈现统计结果。
主要功能
1)自动报告生成,report 可以为回归模型、方差分析(ANOVA)、t检验等生成人类可读的报告,包含统计参数和解释。
2)模型支持广泛,支持常见的统计模型,如线性回归、广义线性模型(GLM)、混合效应模型、贝叶斯模型等。
3)可读性强,生成的报告注重可读性,避免复杂的技术术语,适合直接用于报告或文章中。它能以自然语言解释模型参数及其意义,例如 p 值、效应量、置信区间等。
4)简洁的输出格式,通过调用 report(),可以获得包含主要模型信息的详细报告,包括:
模型参数 显著性测试结果 置信区间 效应量
5)兼容性好,report 可以与其他 easystats 包一起使用,如 performance、parameters 和 effectsize,进一步增强结果的解读和可视化能力。
6)不同格式输出,report 支持多种输出格式,包括 R 控制台、Markdown 报告等,方便用户将结果嵌入不同类型的报告或展示中。
常见场景:在学术论文或报告中快速提取并解释回归结果;使用 RMarkdown 自动生成统计报告;为初学者或非统计背景的用户提供易于理解的统计结果解释。
report 包简化了复杂模型结果的呈现,使得生成高质量、易读的统计报告变得更加高效。
#install.packages("report")
library(report)
下面使用示例展示report的功能:
model <- lm(Sepal.Length ~ Species, data = iris)
report(model) #直接输出可读报告
We fitted a linear model (estimated using OLS) to predict Sepal.Length with
Species (formula: Sepal.Length ~ Species). The model explains a statistically
significant and substantial proportion of variance (R2 = 0.62, F(2, 147) =
119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to
Species = setosa, is at 5.01 (95% CI [4.86, 5.15], t(147) = 68.76, p < .001).
Within this model:
- The effect of Species [versicolor] is statistically significant and positive
(beta = 0.93, 95% CI [0.73, 1.13], t(147) = 9.03, p < .001; Std. beta = 1.12,
95% CI [0.88, 1.37])
- The effect of Species [virginica] is statistically significant and positive
(beta = 1.58, 95% CI [1.38, 1.79], t(147) = 15.37, p < .001; Std. beta = 1.91,
95% CI [1.66, 2.16])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
一般工作流程:该包以两步方式工作,首先,使用report函数创建一个对象;然后,该report对象可以使用文本方式(默认输出)或以表格形式显示。此外,还可以使用as.data.frame()、summary()进行report对象访问以获得更摘要和紧凑的报表版本。
1.该函数适用于各种模型以及其他对象,例如数据框:
report(iris)
The data contains 150 observations of the following 5 variables:
- Sepal.Length: n = 150, Mean = 5.84, SD = 0.83, Median = 5.80, MAD = 1.04,
range: [4.30, 7.90], Skewness = 0.31, Kurtosis = -0.55, 0% missing
- Sepal.Width: n = 150, Mean = 3.06, SD = 0.44, Median = 3.00, MAD = 0.44,
range: [2, 4.40], Skewness = 0.32, Kurtosis = 0.23, 0% missing
- Petal.Length: n = 150, Mean = 3.76, SD = 1.77, Median = 4.35, MAD = 1.85,
range: [1, 6.90], Skewness = -0.27, Kurtosis = -1.40, 0% missing
- Petal.Width: n = 150, Mean = 1.20, SD = 0.76, Median = 1.30, MAD = 1.04,
range: [0.10, 2.50], Skewness = -0.10, Kurtosis = -1.34, 0% missing
- Species: 3 levels, namely setosa (n = 50, 33.33%), versicolor (n = 50,
33.33%) and virginica (n = 50, 33.33%)
2.这些报告在tidyverse工作流程中很好地工作
library(tidyverse)
iris %>%
select(-starts_with("Sepal")) %>%
group_by(Species) %>%
report() %>%
summary()
The data contains 150 observations, grouped by Species, of the following 3
variables:
- setosa (n = 50):
- Petal.Length: Mean = 1.46, SD = 0.17, range: [1, 1.90]
- Petal.Width: Mean = 0.25, SD = 0.11, range: [0.10, 0.60]
- versicolor (n = 50):
- Petal.Length: Mean = 4.26, SD = 0.47, range: [3, 5.10]
- Petal.Width: Mean = 1.33, SD = 0.20, range: [1, 1.80]
- virginica (n = 50):
- Petal.Length: Mean = 5.55, SD = 0.55, range: [4.50, 6.90]
- Petal.Width: Mean = 2.03, SD = 0.27, range: [1.40, 2.50]
3.t检验和相关性
report()可用于自动格式化测试,例如t检验或相关性。
report(t.test(mtcars$mpg ~ mtcars$am))
Effect sizes were labelled following Cohen's (1988) recommendations.
The Welch Two Sample t-test testing the difference of mtcars$mpg by mtcars$am
(mean in group 0 = 17.15, mean in group 1 = 24.39) suggests that the effect is
negative, statistically significant, and large (difference = -7.24, 95% CI
[-11.28, -3.21], t(18.33) = -3.77, p = 0.001; Cohen's d = -1.41, 95% CI [-2.26,
-0.53])
cor.test(iris$Sepal.Length, iris$Sepal.Width) %>%
report() %>%
as.data.frame()
Pearson's product-moment correlation
Parameter1 | Parameter2 | r | 95% CI | t(148) | p
-----------------------------------------------------------------------------
iris$Sepal.Length | iris$Sepal.Width | -0.12 | [-0.27, 0.04] | -1.44 | 0.152
Alternative hypothesis: two.sided
4.方差分析
对于方差分析非常有效,因为它包括效应大小及其解释:
aov(Sepal.Length ~ Species, data = iris) %>%
report()
The ANOVA (formula: Sepal.Length ~ Species) suggests that:
- The main effect of Species is statistically significant and large (F(2, 147)
= 119.26, p < .001; Eta2 = 0.62, 95% CI [0.54, 1.00])
Effect sizes were labelled following Field's (2013) recommendations.
5.广义线性模型 (GLM)
report()也与 GLM 兼容,例如逻辑回归:
model <- glm(vs ~ mpg * drat, data = mtcars, family = "binomial")
report(model)
We fitted a logistic model (estimated using ML) to predict vs with mpg and drat
(formula: vs ~ mpg * drat). The model's explanatory power is substantial
(Tjur's R2 = 0.51). The model's intercept, corresponding to mpg = 0 and drat =
0, is at -33.43 (95% CI [-77.90, 3.25], p = 0.083). Within this model:
- The effect of mpg is statistically non-significant and positive (beta = 1.79,
95% CI [-0.10, 4.05], p = 0.066; Std. beta = 3.63, 95% CI [1.36, 7.50])
- The effect of drat is statistically non-significant and positive (beta =
5.96, 95% CI [-3.75, 16.26], p = 0.205; Std. beta = -0.36, 95% CI [-1.96,
0.98])
- The effect of mpg × drat is statistically non-significant and negative (beta
= -0.33, 95% CI [-0.83, 0.15], p = 0.141; Std. beta = -1.07, 95% CI [-2.66,
0.48])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
6.混合效应模型
library(lme4)
model <- lme4::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
report(model)
We fitted a linear mixed model (estimated using REML and nloptwrap optimizer)
to predict Sepal.Length with Petal.Length (formula: Sepal.Length ~
Petal.Length). The model included Species as random effect (formula: ~1 |
Species). The model's total explanatory power is substantial (conditional R2 =
0.97) and the part related to the fixed effects alone (marginal R2) is of 0.66.
The model's intercept, corresponding to Petal.Length = 0, is at 2.50 (95% CI
[1.19, 3.82], t(146) = 3.75, p < .001). Within this model:
- The effect of Petal Length is statistically significant and positive (beta =
0.89, 95% CI [0.76, 1.01], t(146) = 13.93, p < .001; Std. beta = 1.89, 95% CI
[1.63, 2.16])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
7.贝叶斯模型
贝叶斯模型还可以使用新的SEXIT框架进行报告,该框架结合了清晰度、精确性和实用性。
library(rstanarm)
model <- stan_glm(mpg ~ qsec + wt, data = mtcars)
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 5.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.52 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.049 seconds (Warm-up)
Chain 1: 0.044 seconds (Sampling)
Chain 1: 0.093 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.4e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.045 seconds (Warm-up)
Chain 2: 0.045 seconds (Sampling)
Chain 2: 0.09 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 1.6e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.046 seconds (Warm-up)
Chain 3: 0.04 seconds (Sampling)
Chain 3: 0.086 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 1.2e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.049 seconds (Warm-up)
Chain 4: 0.044 seconds (Sampling)
Chain 4: 0.093 seconds (Total)
Chain 4:
report(model)
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.043 seconds (Warm-up)
Chain 1: 0.042 seconds (Sampling)
Chain 1: 0.085 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.4e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.046 seconds (Warm-up)
Chain 2: 0.045 seconds (Sampling)
Chain 2: 0.091 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 1.3e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.045 seconds (Warm-up)
Chain 3: 0.044 seconds (Sampling)
Chain 3: 0.089 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 1.4e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.046 seconds (Warm-up)
Chain 4: 0.044 seconds (Sampling)
Chain 4: 0.09 seconds (Total)
Chain 4:
We fitted a Bayesian linear model (estimated using MCMC sampling with 4 chains
of 2000 iterations and a warmup of 1000) to predict mpg with qsec and wt
(formula: mpg ~ qsec + wt). Priors over parameters were all set as normal (mean
= 0.00, SD = 8.43; mean = 0.00, SD = 15.40) distributions. The model's
explanatory power is substantial (R2 = 0.81, 95% CI [0.70, 0.89], adj. R2 =
0.79). The model's intercept, corresponding to qsec = 0 and wt = 0, is at 19.81
(95% CI [9.13, 30.69]). Within this model:
- The effect of qsec (Median = 0.93, 95% CI [0.38, 1.47]) has a 99.92%
probability of being positive (> 0), 98.55% of being significant (> 0.30), and
0.12% of being large (> 1.81). The estimation successfully converged (Rhat =
1.000) and the indices are reliable (ESS = 4179)
- The effect of wt (Median = -5.05, 95% CI [-6.00, -4.06]) has a 100.00%
probability of being negative (< 0), 100.00% of being significant (< -0.30),
and 100.00% of being large (< -1.81). The estimation successfully converged
(Rhat = 1.000) and the indices are reliable (ESS = 3918)
Following the Sequential Effect eXistence and sIgnificance Testing (SEXIT)
framework, we report the median of the posterior distribution and its 95% CI
(Highest Density Interval), along the probability of direction (pd), the
probability of significance and the probability of being large. The thresholds
beyond which the effect is considered as significant (i.e., non-negligible) and
large are |0.30| and |1.81| (corresponding respectively to 0.05 and 0.30 of the
outcome's SD). Convergence and stability of the Bayesian sampling has been
assessed using R-hat, which should be below 1.01 (Vehtari et al., 2019), and
Effective Sample Size (ESS), which should be greater than 1000 (Burkner, 2017).
8.其他类型的报告
8.1 对于复杂的报告,可以直接访问报告的各个部分:
model <- lm(Sepal.Length ~ Species, data = iris)
report_model(model)
linear model (estimated using OLS) to predict Sepal.Length with Species (formula: Sepal.Length ~ Species)
report_performance(model)
The model explains a statistically significant and substantial proportion of
variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61)
report_statistics(model)
beta = 5.01, 95% CI [4.86, 5.15], t(147) = 68.76, p < .001; Std. beta = -1.01, 95% CI [-1.18, -0.84]
beta = 0.93, 95% CI [0.73, 1.13], t(147) = 9.03, p < .001; Std. beta = 1.12, 95% CI [0.88, 1.37]
beta = 1.58, 95% CI [1.38, 1.79], t(147) = 15.37, p < .001; Std. beta = 1.91, 95% CI [1.66, 2.16]
8.2 报告参与者的详细信息
data <- data.frame(
"Age" = c(22, 23, 54, 21),
"Sex" = c("F", "F", "M", "M")
)
paste(
report_participants(data, spell_n = TRUE),
"were recruited in the study by means of torture and coercion."
)
[1] "Four participants (Mean age = 30.0, SD = 16.0, range: [21, 54]; Sex: 50.0% females, 50.0% males, 0.0% other) were recruited in the study by means of torture and coercion."
8.3 报告示例描述表
report_sample(iris, by = "Species")
# Descriptive Statistics
Variable | setosa (n=50) | versicolor (n=50) | virginica (n=50) | Total (n=150)
---------------------------------------------------------------------------------------------
Mean Sepal.Length (SD) | 5.01 (0.35) | 5.94 (0.52) | 6.59 (0.64) | 5.84 (0.83)
Mean Sepal.Width (SD) | 3.43 (0.38) | 2.77 (0.31) | 2.97 (0.32) | 3.06 (0.44)
Mean Petal.Length (SD) | 1.46 (0.17) | 4.26 (0.47) | 5.55 (0.55) | 3.76 (1.77)
Mean Petal.Width (SD) | 0.25 (0.11) | 1.33 (0.20) | 2.03 (0.27) | 1.20 (0.76)
参考资料:https://easystats.github.io/report/index.html
https://mp.weixin.qq.com/s/3kVVHSzODwhEYViWyzX5xg