Closed DarioS closed 2 years ago
Hi Dario, @DarioS
Thank you for reporting. Can you provide a reproducible example?
Best, Marcel
Please run load(url("https://www.maths.usyd.edu.au/u/dario/measurements.RData"))
and code above. Test file is 157 KB.
Hi Dario, @DarioS
Thanks for providing some data to work with. As in the data, there are replicates in the data:
anyReplicated(measurements)
#' NanoString
#' TRUE
This means that you will have (Features X N) more columns in the data because of those replicates. Although there is a lot of missing, every column in the data has some information:
table(vapply(dataTable, function(x) all(is.na(x)), logical(1L)))
#' FALSE
#' 4418
In order to avoid this, it's best to remove or resolve replicates first before converting to wideFormat
.
Best, Marcel
Yes, but note that the missing values are created by MultiAssayExperiment package and are not in the input data.
> any(is.na(measurements[["Nanostring"]])) # None missing in input to wideFormat function.
FALSE
The definition of wideFormat
is
wideFormat
: A function to return a wideDataFrame
where each row represents an observation.
so this should not fail in the way which it does from my perspective as an end user.
colData: Each row maps to zero or more observations in each experiment in the ExperimentList.
So, an observation mean each sample and not each patient. Yet, wideFormat
fails if technical replicates are present.
The data is reshaped so I would not expect to see the same shape as in the original.
I will update the documentation to make that more clear.
Each row in the wide format corresponds to the ID in the colData rows (patient).
It doesn't fail when technical replicates are present. They are actually handled as properly as possible (by adding more sets of columns).
Note. FWIW, a sample to me is a measurement rather than an observation.
Re-opening for documentation changes
It's introducing NAs and concantenating feature IDs with sample IDs to create a huge number of new non-existent features.
The first set of columns seem to be O.K. but the next set has column names comprised of the feature IDs and sample IDs pasted.