Open kasperdanielhansen opened 8 years ago
There have been multiple requests for an ability to remove probes prior to various normalization routines, for example based on detection P values. Whether this should be done by completely removing rows in the object or by allowing NAs in the object, is unclear to me at present. One argument against NAs in the object is that it adds (IMO) some frailty: now everything has to be able to deal with NAs, which implies different number of observations for each CpG. Conclusion: I think I'll make it easier to remove rows, and to remove rows based on detectionP.
We added subsetByLoci()
. Still need to check that removal based on detectionP()
is easy.
I'm going to bump this request. Subsetting the samplesheet prior to creation of an RGset seems most straightforward, but I get a failure: "anyDuplicated(!basenames) is not TRUE"
Doesn't make sense since the full samplesheet runs fine...?
Subsetting the RGset just seems like doing more work needlessly. Granted, what do I know, especially seeing as I'm too much a novice to contribute to the solution myself. Thank you for your time.
I don't understand this report at all. Could you please post how you subset the samplesheet as well as the first couple of lines (output of head(samplesheet)
)?
I "delete rows" in LibreOffice's Calc, and then resave the file. I don't think the issue is in the "resaving".
The first seven lines remain the header of the samplesheet. Everything below is the standard table you'd have left. Sample name, well, etc. I removed rows below this.
Attempting to read this in produces the duplicate basenames issue...although I don't understand in the slightest as to why/how.
Alternatively, I would just remove them from the RGchannelset object, but I'm running into hell with being unable to subset how I'd like to (subset RGset@colData@rownames from a list-vector of IDs I want).
However, everything I find online points towards subsetting after the preprocessing has been done, which seems technically incorrect.
Thank you very much for the help on this, I greatly appreciate it.
From: Maarten van Iterson mviterson@gmail.com
Add function argument na.rm=FALSE/TRUE to detectionP which should be passed to colMedians and colMads such that detectionP can handle NAs in the Red and Green intensity matrices of an rgSet. If na.rm=TRUE some detection P-values will be NA, if these were NA on the probe-level, but this is we want. For example, we use this for some probe-level filtering steps e.g. on the number of beads minimally required.