tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.45k stars 304 forks source link

augment.rqs: original data in wrong order #1052

Closed ilapros closed 2 years ago

ilapros commented 3 years ago

The problem

I'm having trouble with the augment.rqs function and the order in which the original data are stored in the augmented output. I don't have the time to create a PR with the solution at the moment, sorry.

Reproducible example

suppressMessages(library(dplyr))
suppressMessages(library(quantreg))
library(broom)
library(ggplot2)library(broom)
library(ggplot2)
data(engel)
rqfit_engel <- rq(foodexp ~ income, data = engel)
rqmultfit_engel <- rq(foodexp ~ income, data = engel, tau = c(0.1,0.25,0.5,0.75,0.9))

ag_fit_engel <- rqfit_engel %>% augment()
ag_multfit_engel <- rqmultfit_engel %>% augment()
### the following should be the same 
ag_fit_engel %>% head()
#> # A tibble: 6 × 5
#>   foodexp income  .resid .fitted  .tau
#>     <dbl>  <dbl>   <dbl>   <dbl> <dbl>
#> 1    256.   420.  -61.0     317.   0.5
#> 2    311.   541.  -73.8     385.   0.5
#> 3    486.   901. -101.      586.   0.5
#> 4    403.   639.  -36.5     439.   0.5
#> 5    496.   751.   -6.55    502.   0.5
#> 6    634.   946.   22.5     611.   0.5
ag_multfit_engel %>% filter(.tau == 0.5) %>% head()
#> # A tibble: 6 × 5
#>   foodexp income .tau   .resid .fitted
#>     <dbl>  <dbl> <chr>   <dbl>   <dbl>
#> 1    486.   901. 0.5    -61.0     317.
#> 2    700.   979. 0.5    -73.8     385.
#> 3    520.   791. 0.5   -101.      586.
#> 4    444.   596. 0.5    -36.5     439.
#> 5    318.   507. 0.5     -6.55    502.
#> 6    476.   896. 0.5     22.5     611.
## the residuals and fitted values are the same 
## but not the original data

## the problem is that the augment.rqs orders by observation
## obs i is repeated length(tau) times so we have 
## rep(obs_num, each = length(tau))
## the original data instead is repeated as 
## rep(data, times = length(tau))
## indeed the one special observation with income > 4500 is at 
which(ag_multfit_engel$income > 4500)
#> [1]  138  373  608  843 1078
## each obs is nrow(engel) positions apart

## the large observation appears only for one of the model residuals 
ggplot(ag_multfit_engel) + 
  geom_point(aes(x=income, .resid)) + 
  facet_wrap(~.tau) 

Created on 2021-09-18 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os Ubuntu 20.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_GB:en #> collate en_GB.UTF-8 #> ctype en_GB.UTF-8 #> tz Europe/Rome #> date 2021-09-18 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.1) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.1) #> broom * 0.7.9.9000 2021-09-18 [1] Github (tidymodels/broom@80a725c) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.1) #> colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.1) #> conquer 1.0.2 2020-08-27 [1] CRAN (R 4.1.1) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.1) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.1) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.1) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.1) #> dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.1) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.1) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.1) #> farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.1) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.1) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.1) #> ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.1) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.1) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.1) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.1) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.1) #> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.1) #> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.1) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.1) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1) #> Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1) #> MatrixModels 0.5-0 2021-03-02 [1] CRAN (R 4.1.1) #> matrixStats 0.60.1 2021-08-23 [1] CRAN (R 4.1.1) #> mime 0.11 2021-06-23 [1] CRAN (R 4.1.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.1) #> pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.1) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1) #> quantreg * 5.86 2021-06-06 [1] CRAN (R 4.1.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.1) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.1) #> rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.1.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.1) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.1) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.1) #> SparseM * 1.81 2021-02-18 [1] CRAN (R 4.1.1) #> stringi 1.7.4 2021-08-25 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1) #> tibble 3.1.4 2021-08-25 [1] CRAN (R 4.1.1) #> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.1.1) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.1) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.1) #> xfun 0.25 2021-08-06 [1] CRAN (R 4.1.1) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.1) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.1) #> #> [1] /home/ilaria/R/x86_64-pc-linux-gnu-library/4.1 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```
simonpcouch commented 3 years ago

Hey, thank you for the issue! Just wanted to drop a note that I've seen this and agree that a fix is in order, though am having trouble making the time to address this myself at the moment as well. Would welcome a PR here if others want to give this a go. :-)

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.