rnabioco / djvdj

An R package to analyze single-cell V(D)J data
https://rnabioco.github.io/djvdj
Other
23 stars 4 forks source link

move external data sets to ExperimentHub #114

Open jayhesselberth opened 1 year ago

jayhesselberth commented 1 year ago

CRAN / Bioconductor won't allow downloading of external data; it won't pass their checks. We can't use download.file anywhere.

Need to:

  1. Generate a smaller data set (a sample of the data being downloaded) that can be included in the pacakge and used for readme / vignettes
  2. Put larger data sets in an ExperimentHub
sheridar commented 1 year ago

I'll downsample the current vignette data so we can include it in the package

jayhesselberth commented 1 year ago

this seems to work:

# sample 1,000 cells from the splen_so object
library(Seurat)

download.file(
  "https://djvdj-data.s3.us-west-2.amazonaws.com/splenocytes.zip",
  "splenocytes.zip",
  quiet = TRUE
)

unzip("splenocytes.zip", overwrite = FALSE)

# Load Seurat object
load("splenocytes/splen_so.rda")

set.seed(42)
# https://github.com/satijalab/seurat/issues/3108#issuecomment-685975338
splen_so_tiny <- splen_so[, sample(colnames(splen_so), size = 1000, replace=FALSE)]

# xz provides better compression than bzip2 default
usethis::use_data(splen_so_tiny, compress = 'xz')

Would then need to take these cell barcodes and filter the 10x files

jayhesselberth commented 1 year ago

You should shoot for 5 MB or less of packaged data. splen_so_tiny above is ~1.8 MB.