thelovelab / tximport

Transcript quantification import for modular pipelines
134 stars 33 forks source link

Enabling reading from cloud storage #20

Closed tomsing1 closed 6 years ago

tomsing1 commented 6 years ago

It is great that the tximport function allows users to supply their own importer function.

I keep most of my files on AWS S3. The aws.s3 package provides great functionality to access S3 objects, e.g. via the s3read_using function.

So I tried to read my Salmon output files directly from AWS, using the following importer:

importer = aws.s3::s3read_using(FUN = readr::read_tsv)

Unfortunately, the tximport functions refuses to run unless the file.exists function returns TRUE for all paths, so S3 paths are not valid.

https://github.com/mikelove/tximport/blob/9b9d3e6e0d0843cfec2048e5dbd0a8ddca189a71/R/tximport.R#L145

Perhaps checking whether a file.exists could be made optional?

Thanks a lot for making this great package available! Thomas

mikelove commented 6 years ago

Absolutely. I'll add an existence.optional argument to devel soon. I have a batch of changes to make and was hoping to get to it this week or next.

mikelove commented 6 years ago

Added existenceOptional=FALSE to 1.7.3

https://github.com/mikelove/tximport/commit/f80fcaac7411ae590688c237088072a313772668

tomsing1 commented 6 years ago

Awesome! Thanks a lot for including this option so quickly.

For the record and in case it is useful to others, the following minimal example works. (Now that the existenceOptional option is available, I realized that my original example was incomplete.)

library(aws.s3)
quant.files <- "s3://PATH/TO/quant.sf"

import_function <- function(x) {
  aws.s3::s3read_using(object = x, FUN = readr::read_tsv)
}

# example without summarization to the gene level
tximport(
  files = quant.files,
  type = "salmon",
  txIn = TRUE,
  txOut = TRUE,
  importer = import_function,
  existenceOptional = TRUE)