hesselberthlab / scrunchy

R toolkit for the analysis of single-cell functional heterogeneity
https://scrunchy.hesselberthlab.org/
Other
2 stars 3 forks source link

Behavior of tidy_logcounts #36

Closed jayhesselberth closed 5 years ago

jayhesselberth commented 5 years ago

tidy_logcounts has undesirable behavior when selectors across two experiments are specified. This is caused by the experiment column, which has two different values in the bottom case, and leads to NA values in the fetched columns.

library(scrunchy)
library(dplyr, warn.conflicts = FALSE)

tidy_logcounts(fsce_small["Uracil_45", , ])
#> # A tibble: 250 x 3
#>    experiment cell_id          Uracil_45
#>    <chr>      <chr>                <dbl>
#>  1 haircut    TGCGGGTTCAACCATG     2.26 
#>  2 haircut    TGGTTCCTCACTTACT     0.880
#>  3 haircut    CATATTCGTATGCTTG     0.705
#>  4 haircut    TCACAAGTCTCCAACC     0.417
#>  5 haircut    GGAATAATCTCGTTTA     1.38 
#>  6 haircut    CTTTGCGAGCGCTCCA     1.06 
#>  7 haircut    GTGCAGCAGCGTGAAC     1.41 
#>  8 haircut    AGCGGTCCATCTACGA     2.11 
#>  9 haircut    GGATGTTTCCAGTATG     1.67 
#> 10 haircut    GTACTCCGTTACGCGC     0.477
#> # ... with 240 more rows

tidy_logcounts(fsce_small["IL7R", , ])
#> # A tibble: 250 x 3
#>    experiment cell_id           IL7R
#>    <chr>      <chr>            <dbl>
#>  1 rnaseq     TGCGGGTTCAACCATG  3.29
#>  2 rnaseq     TGGTTCCTCACTTACT  0   
#>  3 rnaseq     CATATTCGTATGCTTG  0   
#>  4 rnaseq     TCACAAGTCTCCAACC  0   
#>  5 rnaseq     GGAATAATCTCGTTTA  2.36
#>  6 rnaseq     CTTTGCGAGCGCTCCA  3.49
#>  7 rnaseq     GTGCAGCAGCGTGAAC  0   
#>  8 rnaseq     AGCGGTCCATCTACGA  0   
#>  9 rnaseq     GGATGTTTCCAGTATG  0   
#> 10 rnaseq     GTACTCCGTTACGCGC  0   
#> # ... with 240 more rows

tidy_logcounts(fsce_small[c("Uracil_45", "IL7R"), , ])
#> # A tibble: 500 x 4
#>    experiment cell_id           IL7R Uracil_45
#>    <chr>      <chr>            <dbl>     <dbl>
#>  1 rnaseq     TGCGGGTTCAACCATG  3.29        NA
#>  2 rnaseq     TGGTTCCTCACTTACT  0           NA
#>  3 rnaseq     CATATTCGTATGCTTG  0           NA
#>  4 rnaseq     TCACAAGTCTCCAACC  0           NA
#>  5 rnaseq     GGAATAATCTCGTTTA  2.36        NA
#>  6 rnaseq     CTTTGCGAGCGCTCCA  3.49        NA
#>  7 rnaseq     GTGCAGCAGCGTGAAC  0           NA
#>  8 rnaseq     AGCGGTCCATCTACGA  0           NA
#>  9 rnaseq     GGATGTTTCCAGTATG  0           NA
#> 10 rnaseq     GTACTCCGTTACGCGC  0           NA
#> # ... with 490 more rows

# have to drop experiment to get correct values
left_join(
  tidy_logcounts(fsce_small["Uracil_45", , ]) %>% select(-experiment),
  tidy_logcounts(fsce_small["IL7R", , ]) %>% select(-experiment)
)
#> Joining, by = "cell_id"
#> # A tibble: 250 x 3
#>    cell_id          Uracil_45  IL7R
#>    <chr>                <dbl> <dbl>
#>  1 TGCGGGTTCAACCATG     2.26   3.29
#>  2 TGGTTCCTCACTTACT     0.880  0   
#>  3 CATATTCGTATGCTTG     0.705  0   
#>  4 TCACAAGTCTCCAACC     0.417  0   
#>  5 GGAATAATCTCGTTTA     1.38   2.36
#>  6 CTTTGCGAGCGCTCCA     1.06   3.49
#>  7 GTGCAGCAGCGTGAAC     1.41   0   
#>  8 AGCGGTCCATCTACGA     2.11   0   
#>  9 GGATGTTTCCAGTATG     1.67   0   
#> 10 GTACTCCGTTACGCGC     0.477  0   
#> # ... with 240 more rows

Created on 2019-01-03 by the reprex package (v0.2.1)