tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.45k stars 304 forks source link

term column missing from `tidy.ergm` #688

Closed IndrajeetPatil closed 5 years ago

IndrajeetPatil commented 5 years ago

The tidy dataframe for these objects no longer contains the term column. Not 100%, but I think the source of this is rewrite in https://github.com/tidymodels/broom/commit/2e98c1e8ba039b46a4f33df6a38ea91515f88bcf: https://github.com/tidymodels/broom/blob/27c21e19ff8d66fb85775c5072ecf78491b677af/R/ergm-tidiers.R#L65-L72

Here is a reprex:

set.seed(123)
suppressPackageStartupMessages(library(ergm))
data(florentine)

# fit a model where the propensity to form ties between
# families depends on the absolute difference in wealth
gest <- ergm(flomarriage ~ edges + absdiff("wealth"))
#> Starting maximum pseudolikelihood estimation (MPLE):
#> Evaluating the predictor and response matrix.
#> Maximizing the pseudolikelihood.
#> Finished MPLE.
#> Stopping at the initial estimate.
#> Evaluating log-likelihood at the estimate.

# summary
summary(gest)
#> 
#> ==========================
#> Summary of model fit
#> ==========================
#> 
#> Formula:   flomarriage ~ edges + absdiff("wealth")
#> 
#> Iterations:  4 out of 20 
#> 
#> Monte Carlo MLE Results:
#>                 Estimate Std. Error MCMC % z value Pr(>|z|)    
#> edges          -2.302042   0.401906      0  -5.728   <1e-04 ***
#> absdiff.wealth  0.015519   0.006157      0   2.521   0.0117 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>      Null Deviance: 166.4  on 120  degrees of freedom
#>  Residual Deviance: 102.0  on 118  degrees of freedom
#>  
#> AIC: 106    BIC: 111.5    (Smaller is better.)

# tidy output
broom::tidy(gest)
#> # A tibble: 2 x 4
#>   estimate std.error mcmc.error      p.value
#>      <dbl>     <dbl>      <dbl>        <dbl>
#> 1  -2.30     0.402            0 0.0000000102
#> 2   0.0155   0.00616          0 0.0117

Created on 2019-04-10 by the reprex package (v0.2.1.9000)

Session info ``` r devtools::session_info() #> - Session info ---------------------------------------------------------- #> setting value #> version R version 3.6.0 alpha (2019-03-29 r76300) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2019-04-10 #> #> - Packages -------------------------------------------------------------- #> package * version date lib #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> broom 0.5.2.9001 2019-04-09 [1] #> callr 3.2.0 2019-03-15 [1] #> cli 1.1.0 2019-03-19 [1] #> coda 0.19-2 2018-10-08 [1] #> crayon 1.3.4 2017-09-16 [1] #> DEoptimR 1.0-8 2016-11-19 [1] #> desc 1.2.0 2019-04-03 [1] #> devtools 2.0.2 2019-04-08 [1] #> digest 0.6.18 2018-10-10 [1] #> dplyr 0.8.0.9009 2019-03-15 [1] #> ergm * 3.9.4 2018-08-16 [1] #> evaluate 0.13 2019-02-12 [1] #> fansi 0.4.0 2018-11-05 [1] #> fs 1.2.7 2019-03-19 [1] #> generics 0.0.2 2019-03-05 [1] #> glue 1.3.1 2019-03-12 [1] #> highr 0.8 2019-03-20 [1] #> htmltools 0.3.6 2017-04-28 [1] #> knitr 1.22.8 2019-04-08 [1] #> lattice 0.20-38 2018-11-04 [2] #> lpSolve 5.6.13 2015-09-19 [1] #> magrittr 1.5 2014-11-22 [1] #> MASS 7.3-51.2 2019-03-01 [2] #> Matrix 1.2-17 2019-03-22 [1] #> memoise 1.1.0 2017-04-21 [1] #> network * 1.15 2019-04-02 [1] #> pillar 1.3.1 2018-12-15 [1] #> pkgbuild 1.0.3 2019-03-20 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> pkgload 1.0.2 2018-10-29 [1] #> prettyunits 1.0.2 2015-07-13 [1] #> processx 3.3.0 2019-03-10 [1] #> ps 1.3.0 2018-12-21 [1] #> purrr 0.3.2 2019-03-15 [1] #> R6 2.4.0 2019-02-14 [1] #> Rcpp 1.0.1 2019-03-17 [1] #> remotes 2.0.4 2019-04-10 [1] #> rlang 0.3.4 2019-04-07 [1] #> rmarkdown 1.12.3 2019-03-25 [1] #> robustbase 0.93-4 2019-03-19 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> statnet.common 4.2.0 2019-01-08 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> testthat 2.0.1 2018-10-13 [1] #> tibble 2.1.1 2019-03-16 [1] #> tidyr 0.8.3.9000 2019-03-07 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> trust 0.1-7 2015-07-04 [1] #> usethis 1.5.0 2019-04-07 [1] #> utf8 1.1.4 2018-05-24 [1] #> vctrs 0.1.0.9002 2019-03-07 [1] #> withr 2.1.2 2018-03-15 [1] #> xfun 0.6 2019-04-02 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> source #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> local #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.0) #> Github (r-lib/desc@c860e7b) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> Github (tidyverse/dplyr@df735d1) #> CRAN (R 3.5.2) #> CRAN (R 3.6.0) #> Github (brodieG/fansi@ab11e9c) #> CRAN (R 3.6.0) #> Github (r-lib/generics@c15ac43) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> Github (yihui/knitr@6103edd) #> CRAN (R 3.6.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.5.3) #> Github (rstudio/rmarkdown@503cc5f) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> CRAN (R 3.5.2) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> Github (tidyverse/tidyr@7a51bfd) #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> Github (r-lib/vctrs@6b8c98a) #> CRAN (R 3.5.1) #> CRAN (R 3.6.0) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> #> [1] C:/Users/inp099/Documents/R/win-library/3.6 #> [2] C:/Program Files/R/R-3.6.0alpha/library ```
alexpghayes commented 5 years ago

AAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHH WHY. Yeah I definitely broke this.

briatte commented 5 years ago

I believe my last PR fixes the issue, and also adds support for finding coefficient older versions of ergm, thereby fixing potential 'partial match' issues while making broom work with ergm 3.9.4 (which is still the current CRAN version of ergm as of writing).

IndrajeetPatil commented 5 years ago

This tidier no longer works at all; previously at least it returned something.

library(ergm)
#> Loading required package: network
#> network: Classes for Relational Data
#> Version 1.15 created on 2019-04-01.
#> copyright (c) 2005, Carter T. Butts, University of California-Irvine
#>                     Mark S. Handcock, University of California -- Los Angeles
#>                     David R. Hunter, Penn State University
#>                     Martina Morris, University of Washington
#>                     Skye Bender-deMoll, University of Washington
#>  For citation information, type citation("network").
#>  Type help("network-package") to get started.
#> 
#> ergm: version 3.10.4, created on 2019-06-10
#> Copyright (c) 2019, Mark S. Handcock, University of California -- Los Angeles
#>                     David R. Hunter, Penn State University
#>                     Carter T. Butts, University of California -- Irvine
#>                     Steven M. Goodreau, University of Washington
#>                     Pavel N. Krivitsky, University of Wollongong
#>                     Martina Morris, University of Washington
#>                     with contributions from
#>                     Li Wang
#>                     Kirk Li, University of Washington
#>                     Skye Bender-deMoll, University of Washington
#>                     Chad Klumb
#> Based on "statnet" project software (statnet.org).
#> For license and citation information see statnet.org/attribution
#> or type citation("ergm").
#> NOTE: Versions before 3.6.1 had a bug in the implementation of the
#> bd() constriant which distorted the sampled distribution somewhat.
#> In addition, Sampson's Monks datasets had mislabeled vertices. See
#> the NEWS and the documentation for more details.
#> NOTE: Some common term arguments pertaining to vertex attribute
#> and level selection have changed in 3.10.0. See terms help for
#> more details. Use 'options(ergm.term=list(version="3.9.4"))' to
#> use old behavior.
library(broom)

# Using the same example as the ergm package
# Load the Florentine marriage network data
data(florentine)

# Fit a model where the propensity to form ties between
# families depends on the absolute difference in wealth
gest <- ergm(flomarriage ~ edges + absdiff("wealth"))
#> Warning: `set_attrs()` is deprecated as of rlang 0.3.0
#> This warning is displayed once per session.
#> Starting maximum pseudolikelihood estimation (MPLE):
#> Evaluating the predictor and response matrix.
#> Maximizing the pseudolikelihood.
#> Finished MPLE.
#> Stopping at the initial estimate.
#> Evaluating log-likelihood at the estimate.

# Show terms, coefficient estimates and errors
tidy(gest)
#> Error in add_column(., term = names(x[[coefs]]), .before = 1): could not find function "add_column"

Created on 2019-06-25 by the reprex package (v0.3.0)

IndrajeetPatil commented 5 years ago

Did the tests pass? How was a PR that breaks a working tidier merged into the master?!

alexpghayes commented 5 years ago

The tests have been failing for a while and I haven't had time to fix them so I didn't check. A mistake but looks easy to fix. I'll take a look tomorrow.

briatte commented 5 years ago

Hi @IndrajeetPatil and @alexpghayes

PR #721 above fixes the issue, which is my fault, not Alex's. I was misled by partial-match warnings about coefs and coefficients in the summary object.

I've performed some tests on all ergm v3.x versions, and with this PR, broom successfully extracts (via either tidy or glance) exactly the same output regardless of the version of ergm.

Note that the NEWS file of the package reports a change in the summary object (z-tests instead of t-tests from version 3.9.4 onwards), but that does not seem to affect the names of the summary()$coefs object, which always has two columns named z value and Pr(>|z|).

IndrajeetPatil commented 5 years ago

Sorry, my point wasn't to blame anyone. I just wanted to make sure that we have the infrastructure (tests, CI, etc.) in place to catch this. If anything changes with the most used tidiers (lm, glm, clm, etc.), it is immediately noticed by the users. The changes to less used tidiers, therefore, deserve much more scrutiny in the form of tests, etc., and I just wanted to make sure that we had those since I had guessed that you must have made the PR after your local tests passed.

In #710, I had tried to do some cleanup to get all the tests running so that the CI builds can finally succeed, which would make it easier to evaluate if PRs are breaking anything.

briatte commented 5 years ago

Don't apologise, my previous PR did break things :)

I tried running the tests on broom, but there are simply too many of them, with dozens of failing ones. One error that occurs a lot is this one:

test-lavaan.R:14: failure: lavaan tidier arguments
length(not_allowed) == 0 isn't true.
Arguments quick to `tidy.lavaan` must be listed in the argument glossary.

It looks like @alexpghayes is trying to 'export' the tests t a different package: https://github.com/alexpghayes/modeltests

alexpghayes commented 5 years ago

TODO:

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.