Closed jorainer closed 9 months ago
Easiest solution would be to have a helper function that allows to write sample data to an existing MsBackendSql database and a function that allows to load/get an MsExperiment
from such an (extended) database.
... I don't get why having sample as part of MsBackendSql.
totally agree - I'm just playing a bit with that idea.
Q: Why I'm doing that?
A: I have now some of our data sets as SQL databases, which is nice, but in addition I also always need to have an e.g. xls sheet with sample annotations. I need then to load that file in addition to connecting to the database to get my MsExperiment
for the data analysis. I would find it much more convenient if I could get everything already from the database (this would not be an on-disk sample annotation solution, it would just read a database table and putting that into the @sampleData
of the MsExperiment
. Advantage would be to have all info from an experiment in a single place.
Let me maybe play a bit with that and then we can make a dev call to discuss.
Here comes some description/use case that we could maybe discuss in a dev call:
Currently, readMsExperiment
is designed to create a MsExperiment
with MS
data read/imported from mzML files, i.e. the use of a MsBackendMzR
backend. Creating a MsExperiment
with another backend is not easily possible.
What I would suggest is to make the readMsExperiment
a method to dispatch on
the parameter spectraBackend
. The default implementation would use
MsBackendMzR
, but this would allow implementation for other backends (such as
the MsBackendSql
too:
#' Generic should go to ProtGenerics
setGeneric("readMsExperiment", function(spectraBackend, ...)
standardGeneric("readMsExperiment"))
#' These are the "default" implementation and should go to MsExperiment.
setMethod("readMsExperiment", "missing",
function(spectraBackend, spectraFiles = character(),
sampleData = data.frame(), ...) {
MsExperiment::readMsExperiment(spectraFiles = spectraFiles,
sampleData = sampleData)
})
setMethod("readMsExperiment", "character", function(spectraBackend, ...) {
MsExperiment::readMsExperiment(spectraFiles = spectraBackend, ...)
})
setMethod("readMsExperiment", "MsBackend",
function(spectraBackend, spectraFiles = character(),
sampleData = data.frame(), ...) {
MsExperiment::readMsExperiment(spectraFiles = spectraFiles,
sampleData = sampleData)
})
There could then be specific implementation for certain MsBackend
classes
(defined in the respective R package) that would simplify reading MS experiment
data. An implementation for MsBackendSql
is shown below. Different use cases
for that functions are shown further below.
setMethod("readMsExperiment", "MsBackendSql",
function(spectraBackend, spectraFiles = character(),
sampleData = data.frame(), ...) {
## initialize backend - should throw error if not all required
## informations are provided.
be <- backendInitialize(spectraBackend, ...)
map <- matrix(nrow = 0, ncol = 2)
if (!(is.data.frame(sampleData) ||
inherits(sampleData, "DataFrame")))
stop("'sampleData' is expected to be a 'data.frame' ",
"or 'DataFrame'")
if (length(spectraFiles) || nrow(sampleData)) {
## Link samples to spectra using provided spectraFiles
## and dataOrigin from the database.
if (length(spectraFiles) != nrow(sampleData))
stop("If provided, length of 'spectraFiles' needs to ",
"match the number of rows of 'sampleData'.")
map <- findMatches(basename(spectraFiles),
basename(dataOrigin(be)))
map <- cbind(from(map), to(map))
} else {
con <- dbconn(be)
if (inherits(be, "MsBackendOfflineSql"))
on.exit(dbDisconnect(con))
if (.db_contains_sample_data(con)) {
sampleData <- dbGetQuery(con, "select * from sample_data")
map <- unname(as.matrix(dbGetQuery(
con, "select * from sample_to_msms_spectrum")))
}
}
res <- MsExperiment::MsExperiment()
res@spectra <- Spectra(be)
res@sampleData <- as(sampleData, "DataFrame")
if (nrow(map) > 0) {
res@sampleDataLinks[["spectra"]] <- map
mcols(res@sampleDataLinks)["spectra", "subsetBy"] <- 1L
} else
warning("Could not derive mapping between samples and ",
"spectra. Please use 'linkSampleData' to establish ",
"that mapping.")
res
})
Preparing data for use cases:
MsBackendSql
databaselibrary(Spectra)
library(MsBackendSql)
library(RSQLite)
mm8_file <- system.file("microtofq", "MM8.mzML", package = "msdata")
mm14_file <- system.file("microtofq", "MM14.mzML", package = "msdata")
sd <- data.frame(file = basename(c(mm8_file, mm14_file)),
sample_name = c("MM8", "MM14"),
batch = c("2021-11-12", "2021-12-11"),
injection_index = c(2L, 5L),
sample_source = c("plasma", "serum"))
## Now, storing the data to a MsBackensSql
mm_sqlite <- tempfile()
createMsBackendSqlDatabase(dbConnect(SQLite(), mm_sqlite),
c(mm8_file, mm14_file), blob = TRUE)
The standard way to import MS data from e.g. the mzML files would be:
## Import from raw data files
mse <- readMsExperiment(spectraBackend = MsBackendMzR(),
spectraFiles = c(mm8_file, mm14_file),
sampleData = sd)
mse
Object of class MsExperiment
Spectra: MS1 (310)
Experiment data: 2 sample(s)
Sample data links:
- spectra: 2 sample(s) to 310 element(s).
## The same
mse <- readMsExperiment(c(mm8_file, mm14_file), sampleData = sd)
mse
Object of class MsExperiment
Spectra: MS1 (310)
Experiment data: 2 sample(s)
Sample data links:
- spectra: 2 sample(s) to 310 element(s).
The implementation for MsBackendSql
simplifies the use with this type of
backend (at the very bottom is an example how that needs to be done at present,
i.e. without the proposed changes).
## "Read" an MsExperiment with data from that backend. `spectraFiles` is
## used to define the mapping between samples and spectra (using `dataOrigin`).
## Additional parameters are passed to the backendInitialize method of
## MsBackendOfflineSql
mse <- readMsExperiment(MsBackendOfflineSql(), sampleData = sd,
spectraFiles = c(mm8_file, mm14_file),
drv = SQLite(), dbname = mm_sqlite)
mse
Object of class MsExperiment
Spectra: MS1 (310)
Experiment data: 2 sample(s)
Sample data links:
- spectra: 2 sample(s) to 310 element(s).
What this would enable in addition is to store also sample annotations directly
to the MsBackendSql
database. Storing sample annotations together with the raw
MS data has the advantage that information for one experiment is all bundled
together (-> data integrity!). For self-contained storage modes (such as a
SQLite database file or any other SQL database) that has the clear advantage
that a whole experiment could be shared as a single file.
My proposal would be to store sample annotations in the same database (but in separate database tables). This would not interfere with the standard use of the backend.
Below we write the sample annotation to an existing MsBackendSql
database. This needs to be done only once, ideally right after the database was
created using the createMsBackendSqlDatabase
function above.
be <- backendInitialize(MsBackendOfflineSql(), dbname = mm_sqlite,
drv = SQLite())
be$file <- basename(dataOrigin(be))
writeSampleData(be, sampleData = sd, colname = "file", spectraVariable = "file")
As a side effect, these SQL databases could also be used by other tools as it is a simple, plain SQL database.
dbListTables(dbconn(be))
[1] "msms_spectrum" "msms_spectrum_peak_blob" "sample_data"
[4] "sample_to_msms_spectrum"
The implementation of the readMsExperiment
method for MsBackendSql
could
then retrieve also sample annotation from the database if present.
mse <- readMsExperiment(MsBackendOfflineSql(), dbname = mm_sqlite,
drv = SQLite())
mse
Object of class MsExperiment
Spectra: MS1 (310)
Experiment data: 2 sample(s)
Sample data links:
- spectra: 2 sample(s) to 310 element(s).
As a comparison, creating a MsExperiment
with a MsBackendSql
backend would
be way less user friendly (and also with a higher change of errors):
library(MsExperiment)
mse <- MsExperiment()
## Add Spectra
be <- backendInitialize(MsBackendOfflineSql(), drv = SQLite(),
dbname = mm_sqlite)
sps <- Spectra(be)
sps$file <- basename(dataOrigin(sps))
spectra(mse) <- sps
## Add samples
sampleData(mse) <- as(sd, "DataFrame")
## Link samples to spectra
mse <- linkSampleData(mse, with = "sampleData.file = spectra.file")
mse
Happy to discuss :)
Closing this issue as this was implemented in MsExperiment
.
Maybe also store sample annotations into the database, such that an
MsExperiment
could be directly loaded from the database.Question is however a) if that should go in here or into the
MsExperiment
package b) if only sample annotations should be stored or also the linkage to the spectraMaybe implement an additional
MsExperimentOfflineSql
class that extends theMsBackendSql
but in addition provides the sample annotations? or is that overengineering?