Closed jasminegr closed 5 years ago
what does str()
return for each of the dfs?
also could just try
library(tidyverse) dt %>% select(starts_with('ICD')) %>% gather() %>% count(value)
I think this has something to do with the default behaviour of apply
and table
.
Consider the following example:
dt2 <- dt <- data.frame(
eid = 1:10,
ICD1 = c(0,1,1,0,0,1,1,0,0,0),
ICD2 = c(0,0,1,1,1,0,1,0,0,1),
ICD3 = rep(1,10)
)
dt2$ICD3 <- c(1,0,1,0,0,0,1,0,0,1)
table(dt$ICD3)
#>
#> 1
#> 10
str(table(dt$ICD3))
#> 'table' int [1(1d)] 10
#> - attr(*, "dimnames")=List of 1
#> ..$ : chr "1"
table(dt2$ICD3)
#>
#> 0 1
#> 6 4
str(table(dt2$ICD3))
#> 'table' int [1:2(1d)] 6 4
#> - attr(*, "dimnames")=List of 1
#> ..$ : chr [1:2] "0" "1"
As we can see, table(dt$ICD3)
returned a 1-dimension array because all observations have values of 1
, whereas table(dt$ICD3)
(and other ICD codes) returned 2-dimension arrays because there are at least 1 values of 0
and 1
.
I think apply
returned a list for dt
because the output were arrays of different dimensions, whereas for dt2
it can merge the output nicely into a 2-dimension array.
One workaround for this is to first convert the ICD 10 code variable to factor:
apply(dt[,2:dim(dt)[2]], 2, function(x) table(factor(x, levels = c(0,1))))
#> ICD1 ICD2 ICD3
#> 0 6 5 0
#> 1 4 5 10
apply(dt2[,2:dim(dt2)[2]], 2, function(x) table(factor(x, levels = c(0,1))))
#> ICD1 ICD2 ICD3
#> 0 6 5 6
#> 1 4 5 4
Or the tidyverse
solution as suggested by Dan:
library(tidyverse)
count_event <- function(dt){
dt %>% select(starts_with('ICD')) %>% gather %>% count(key, value) %>% spread(key, n, fill = 0)
}
count_event(dt)
#> # A tibble: 2 x 4
#> value ICD1 ICD2 ICD3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0 6 5 0
#> 2 1 4 5 10
count_event(dt2)
#> # A tibble: 2 x 4
#> value ICD1 ICD2 ICD3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0 6 5 6
#> 2 1 4 5 4
Hope this helps
Another solution - only using base R
events <- as.data.frame(apply(dt[,2:dim(dt)[2]] , 2, function(X){c(sum(X==0),sum(X==1))}))
A question is what format do you want the result in depending on different potential inputs? Do you want to just add up the 0s and 1s?
sapply(dt[2:4], function(x) c(sum(x == 0), sum(x == 1)))
ICD1 ICD2 ICD3
[1,] 6 5 0
[2,] 4 5 10
Hi,
I had this issue last week.
I wanted to get the counts for events (1) and no-events (0) from a dataset. In the first data table, the apply function returns a list (not what I want), whereas the second data table gives us the counts of events and no-events for each ICD code (what I want).
Does anyone know why?