Change how `pbmc_small` is stored and generated

mojaveazure commented 8 months ago

Change pbmc_small from a binary Rda file to an R script. This change

provides a record of how pbmc_small is generated
provides the raw MTX for pbmc_small
ensures that pbmc_small is always up-to-date

This change also adds a v5 assay to pbmc_small called RNA5 alongside the existing v3 RNA assay for testing and demonstration purposes

Despite moving to an R script, pbmc_small will continue to be bundled and distributed as a binary Rda file; R CMD build will resave the R script to an Rda file and remove the R script for package distribution

The biggest drawback is devtools::load_all(); data("pbmc_small") no longer works, due to differences in devtools::load_all() and R CMD build; to get around this, I've provided an internal .PBMCsmall() function that will load pbmc_small from the R script

# re-build `pbmc_small` and save in global environment
.PBMCsmall()

As this results in always re-building pbmc_small, one can pass mode = "resave" to save pbmc_small as an Rda file for reuse with load()

.PBMCsmall(mode = "resave")

There are also a couple of other changes to enable this functionality, namely:

caching SeuratObject version at load-time
minor update in backwards compatibility compliance checking
new helper function to find R package version without using utils::packageVersion()

dcollins15 commented 8 months ago

I finally got a chance to test this out properly — I think this is a great change!! 🙌

It appears that this well require us to add a call to data(pbmc_small before it can be accessed in Seurat's tests. I suppose the correct place would just be inside seurat/tests/testthat.R

On a related note, any ideas why this update would cause some of the tests under seurat/tests/testthat/test_differential_expression.R to start failing?

── Failed tests ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test_differential_expression.R:169:3): latent.vars works
results[1, "p_val"] not equal to 2.130202e-16.
1/1 mismatches
[1] 2.66e-16 - 2.13e-16 == 5.34e-17

Failure (test_differential_expression.R:170:3): latent.vars works
results[1, "avg_logFC"] not equal to -3.102866.
1/1 mismatches
[1] -2.57 - -3.1 == 0.531

Failure (test_differential_expression.R:171:3): latent.vars works
results[1, "pct.1"] not equal to 0.417.
1/1 mismatches
[1] 0.306 - 0.417 == -0.111

Failure (test_differential_expression.R:173:3): latent.vars works
results[1, "p_val_adj"] not equal to 4.899466e-14.
1/1 mismatches
[1] 6.13e-14 - 4.9e-14 == 1.23e-14

Failure (test_differential_expression.R:174:3): latent.vars works
rownames(x = results)[1] not equal to "LYZ".
1/1 mismatches
x[1]: "CST3"
y[1]: "LYZ"

Failure (test_differential_expression.R:183:3): group.by works
nrow(x = results) not equal to 190.
1/1 mismatches
[1] 206 - 190 == 16

Failure (test_differential_expression.R:185:3): group.by works
results[1, "p_val"] not equal to 0.02870319.
1/1 mismatches
[1] 0.00757 - 0.0287 == -0.0211

Failure (test_differential_expression.R:186:3): group.by works
results[1, "avg_logFC"] not equal to 0.8473584.
1/1 mismatches
[1] 6.48 - 0.847 == 5.63

Failure (test_differential_expression.R:187:3): group.by works
results[1, "pct.1"] not equal to 0.455.
1/1 mismatches
[1] 0.171 - 0.455 == -0.284

Failure (test_differential_expression.R:188:3): group.by works
results[1, "pct.2"] not equal to 0.194.
1/1 mismatches
[1] 0 - 0.194 == -0.194

Failure (test_differential_expression.R:190:3): group.by works
rownames(x = results)[1] not equal to "NOSIP".
1/1 mismatches
x[1]: "TMEM40"
y[1]: "NOSIP"

Failure (test_differential_expression.R:199:3): subset.ident works
nrow(x = results) not equal to 183.
1/1 mismatches
[1] 180 - 183 == -3

Failure (test_differential_expression.R:201:3): subset.ident works
results[1, "p_val"] not equal to 0.0129372.
1/1 mismatches
[1] 0.00889 - 0.0129 == -0.00404

Failure (test_differential_expression.R:202:3): subset.ident works
results[1, "avg_logFC"] not equal to 1.912603.
1/1 mismatches
[1] -1.18 - 1.91 == -3.09

Failure (test_differential_expression.R:203:3): subset.ident works
results[1, "pct.1"] not equal to 0.5.
1/1 mismatches
[1] 0.238 - 0.5 == -0.262

Failure (test_differential_expression.R:204:3): subset.ident works
results[1, "pct.2"] not equal to 0.125.
1/1 mismatches
[1] 0.667 - 0.125 == 0.542

Failure (test_differential_expression.R:206:3): subset.ident works
rownames(x = results)[1] not equal to "TSPO".
1/1 mismatches
x[1]: "LYZ"
y[1]: "TSPO"

Error (test_differential_expression.R:472:3): (code run outside of `test_that()`)
<testthat_abort_reporter/rlang_error/error/condition>
Error in `stop_reporter(c("Maximum number of failures exceeded; quitting at end of file.", 
    i = "Increase this number with (e.g.) {.run testthat::set_max_fails(Inf)}"))`: Maximum number of failures exceeded; quitting at end of file.
ℹ Increase this number with (e.g.) testthat::set_max_fails(Inf)
Backtrace:
     ▆
  1. └─testthat::context("FindConservedMarkers") at test_differential_expression.R:472:3
  2.   └─testthat:::context_start(desc)
  3.     └─get_reporter()$.start_context(desc)
  4.       └─self$end_context(self$.context)
  5.         └─testthat:::o_apply(self$reporters, "end_context", context)
  6.           └─base::lapply(objects, f)
  7.             └─testthat (local) FUN(X[[i]], ...)
  8.               └─x$end_context(...)
  9.                 └─testthat:::stop_reporter(...)
 10.                   └─cli::cli_abort(message, class = "testthat_abort_reporter", error_call = NULL)
 11.                     └─rlang::abort(...)

[ FAIL 18 | WARN 0 | SKIP 0 | PASS 271 ]

dcollins15 commented 8 months ago

I took another look at these tests - getting a new set of errors:

Error (test_differential_expression.R:9:1): (code run outside of `test_that()`)
Error in `NormalizeData.default(pbmc_small, normalization.method = "CLR")`: CLR normalization is only supported for dense and dgCMatrix
Backtrace:
    ▆
 1. ├─base::suppressWarnings(NormalizeData(pbmc_small, normalization.method = "CLR")) at test_differential_expression.R:9:1
 2. │ └─base::withCallingHandlers(...)
 3. ├─Seurat::NormalizeData(pbmc_small, normalization.method = "CLR")
 4. └─Seurat:::NormalizeData.default(pbmc_small, normalization.method = "CLR") at seurat/R/generics.R:395:3

Error (test_integratedata.R:3:1): (code run outside of `test_that()`)
Error: We are unable to convert Seurat objects less than version 2.X to version 3.X
Please use devtools::install_version to install Seurat v2.3.4 and update your object to a 2.X object
Backtrace:
    ▆
 1. ├─base::suppressWarnings(UpdateSeuratObject(pbmc_small)) at test_integratedata.R:3:1
 2. │ └─base::withCallingHandlers(...)
 3. └─SeuratObject::UpdateSeuratObject(pbmc_small)

Error (test_integration.R:3:1): (code run outside of `test_that()`)
Error: We are unable to convert Seurat objects less than version 2.X to version 3.X
Please use devtools::install_version to install Seurat v2.3.4 and update your object to a 2.X object
Backtrace:
    ▆
 1. ├─base::suppressWarnings(UpdateSeuratObject(pbmc_small)) at test_integration.R:3:1
 2. │ └─base::withCallingHandlers(...)
 3. └─SeuratObject::UpdateSeuratObject(pbmc_small)

Error (test_integration5.R:14:1): (code run outside of `test_that()`)
Error in `UseMethod(generic = "LayerData", object = object)`: no applicable method for 'LayerData' applied to an object of class "NULL"
Backtrace:
    ▆
 1. ├─base::suppressWarnings(...) at test_integration5.R:14:1
 2. │ └─base::withCallingHandlers(...)
 3. ├─SeuratObject::CreateAssay5Object(...)
 4. │ └─SeuratObject::CheckLayersName(matrix.list = counts, layers.type = "counts")
 5. └─SeuratObject::LayerData(test.data, assay = "RNA", layer = "counts")

Error (test_transferdata.R:3:1): (code run outside of `test_that()`)
Error: We are unable to convert Seurat objects less than version 2.X to version 3.X
Please use devtools::install_version to install Seurat v2.3.4 and update your object to a 2.X object
Backtrace:
    ▆
 1. ├─base::suppressWarnings(UpdateSeuratObject(pbmc_small)) at test_transferdata.R:3:1
 2. │ └─base::withCallingHandlers(...)
 3. └─SeuratObject::UpdateSeuratObject(pbmc_small)

Error (test_visualization.R:7:1): (code run outside of `test_that()`)
Error in `UseMethod(generic = "Embeddings", object = object)`: no applicable method for 'Embeddings' applied to an object of class "NULL"
Backtrace:
    ▆
 1. └─Seurat::CollapseEmbeddingOutliers(...) at test_visualization.R:7:1
 2.   └─SeuratObject::Embeddings(object = object[[reduction]]) at seurat/R/visualization.R:5248:3

[ FAIL 6 | WARN 1 | SKIP 6 | PASS 223 ]

I tried installing from a fresh R env to try to get back to the initial behavior but I haven't been able to repro the inital errors I posted 😖

satijalab / seurat-object

Change how `pbmc_small` is stored and generated #188