Open samgregoire opened 4 months ago
(ping @leopoldguyot who raised the same issue to us)
When implementing this, I had in mind that we have to impose a "runCol" in the colData, but reading again the documentation, I get the confusion. In the param description:
runCol: For the multi-set case, a numeric(1) or character(1) pointing to the column of assayData (and colData, is set) that contains the runs/batches. Make sure that the column name in both tables are identical and syntactically valid (if you supply a character) or have the same index (if you supply a numeric). Note that characters are converted to syntactically valid names using make.names
Indeed, the doc mentions here that runCol must be the same across the two tables, but if the argument runCol
is something else than "runCol"
, then the two table (in the current implementation) cannot share the same columns...
Later in the Details section:
Multi-set case: the colData must contain a column named quantCols that provides the names of the columns in assayData with the quantitative values for each sample, and a column named
runCol
that provides the MS runs/batches in which each sample has been acquired. The entries in colData[["runCol"]] are matched against the entries provided by assayData[[runCol]].
The first sentence mentions runCol
as a variable (so can be any character) while the second sentence indicates that the column named "runCol" is matched, so here the name is imposed.
I agree this is super confusing and must be solved indeed, 2 solutions:
.checkRunCol()
so to allow that runCol
points to a column name that is shared across the two tables. For instance, if runCol == "FileID
, then readQFeatures()
should expect that assayData
and colData
both contain a column named FileID
. "runCol"
After discussion with Léopold and now your issue, you have convinced me of going for 1., but @lgatto what's your opinion?
Hello,
I have an issue / question regarding the
runCol
argument from thereadQFeatures()
function. My sample annotations are stored in adata.frame
calledmeta
. The column containing the name of the runs from which the samples come from is calledfileID
. I thought the purpose of therunCol
argument was to identify the column where the names of the runs are stored. However, when I setrunCol = "fileID"
, I get this error saying that the 'colData' must contain a column called 'runCol'.If I change the column name in my sample annotations from
"fileID"
to"runCol"
, everything works and mycolData
are exactly as I want them to be.Since there is a
runCol
argument, shouldn't I be free to name my "run" column whatever I want in the metadata as long as I indicate it through the saidrunCol
argument? With the current implementation of thereadQFeatures()
function, it seems that I need the column to be imperatively namedrunCol
.From what I found, the
QFeatures:::.checkRunCol()
function is responsible for the error becauseif (!"runCol" %in% colnames(colData))
induces a stop.I think that removing the quotation marks in the condition inducing the stop and using
colData[[runCol]]
instead ofcolData$runCol
would solve this issue, unless of course that's howrunCol
is supposed to behave.PS. The
QFeatures:::.formatColData()
function also uses thecolData$runCol
notation.