pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
305 stars 95 forks source link

Improve obs_to_matrix with data.table #210

Open andrewrech opened 5 years ago

andrewrech commented 5 years ago

Replace reshape2::dcast with data.table::dcast for speed gain and to avoid issue with long sample names causing vector return error in reshape2 but not data.table:

Using reshape2:

obs_counts <- dcast(obj$obs_norm, target_id ~ sample, value.var = value_name)
Aggregation function missing: defaulting to length

Error during wrapup: dims [product 868328] do not match the length of object [41674225]

But no error using data.table.

obj_norm to reproduce

Linux i-094bb338984186ac0 4.15.0-1031-aws #33-Ubuntu SMP Fri Dec 7 09:32:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sleuth_0.30.0     data.table_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0            bindr_0.1.1           magrittr_1.5.0.9000
 [4] tidyselect_0.2.5.9000 munsell_0.5.0.9000    colorspace_1.4-0
 [7] R6_2.3.0              rlang_0.3.1.9000      dplyr_0.7.8
[10] tools_3.5.2           parallel_3.5.2        grid_3.5.2
[13] rhdf5_2.24.0          gtable_0.2.0          lazyeval_0.2.1.9000
[16] assertthat_0.2.0.9000 tibble_2.0.1          crayon_1.3.4
[19] bindrcpp_0.2.2        purrr_0.3.0           Rhdf5lib_1.2.1
[22] ggplot2_3.1.0.9000    glue_1.3.0.9000       compiler_3.5.2
[25] pillar_1.3.1          scales_1.0.0.9000     pkgconfig_2.0.2
andrewrech commented 5 years ago

Additionally fixed use of data.table::dcast on data.frame (this function does not implicitly convert)

warrenmcg commented 5 years ago

Hi @andrewrech,

Thank you for your work here! Two suggestions:

  1. A cleaner way to handle issue #135 is to add an #' @importFrom utils head statement to the documentation for sleuth_gene_table (between lines 1092 and 1093 of sleuth.R here), as well as sleuth_to_matrix(between lines 32 and 33 of matrix.R here). Once you've made the change, open an R session in the sleuth directory and use devtools::document() to update the NAMESPACE file.
  2. There are three calls to reshape2 in the plot_transcript_heatmap function in plots.R. If you convert those to data.table::dcast and then add a line after the if statements to convert tab_df to a data.frame, we can remove a dependency to reshape2 altogether (just deleting it from the "depends" list in the DESCRIPTION file). You should also run devtools::document() to make sure references to reshape2 are gone from the NAMESPACE file.

In both cases, you can run devtools::check() to see if there are any other issues.

warrenmcg commented 5 years ago

Also, I've switched to the devel branch for merging, as it allows us to combine this with other changes for the next stable release of sleuth.

andrewrech commented 5 years ago

@warrenmcg Thank you for your help

then add a line after the if statements to convert tab_df to a data.frame

data.table::dcast returns a data frame if given a data frame; no need to convert back

data.table::dcast(test_df, time ~ variable, fun=mean) %>% str
'data.frame':   12 obs. of  2 variables:
 $ time  : num  0 2 4 6 8 10 12 14 16 18 ...
 $ weight: num  41.1 49.2 60 74.3 91.2 ...
andrewrech commented 5 years ago

@warrenmcg see my note above

warrenmcg commented 5 years ago

Hi @andrewrech, you are correct. Sorry about that! We will be reviewing everything next week to merge and hopefully have a new release soon! Thanks again for your help and effort! 👍