Open fedeserral opened 3 months ago
@fedeserral You have misunderstood the means of thr. The "thr" refers to the maximum number of allowed missing values in at least one condition. For example, when tr=0, it requires that there are no missing values in at least one condition; when tr=1, it requires that there is no more than one missing value in at least one condition. You can get a better understanding of "thr" from the data examples below.
set.seed(2)
df = data.frame(matrix(sample(c(1,0),12*15, replace = T), nrow = 15,ncol = 12))
samples = apply(expand.grid(letters[1:3], 1:4), 1, function(x) paste0(x[1],"_",x[2]))
samples = sort(samples)
colnames(df) = samples
df$name = df$ID = LETTERS[1:15]
expDesign = get_exdesign_parse(samples)
se = make_se(df, columns = 1:12, expdesign = expDesign)
plot_missval(se, cluster_columns = F)
filt0 = filter_se(se, thr = 0) # a or b or c has 0 mv
plot_missval(filt0, cluster_columns = F)
filt1 = filter_se(se, thr = 1) # a or b or c has no more than 1 mv
plot_missval(filt1, cluster_columns = F)
filt2 = filter_se(se, thr = 2) # a or b or c has no more than 2 mv
plot_missval(filt2, cluster_columns = F)
assay(se)
assay(filt2)
assay(filt1)
assay(filt0)
Such filter is common in proteomics data analysis, especially in AP-MS. Preys may have no intensity in one condition(control group).
Hi! I have a question. I don't completely understand how the function filter_se() works. Specifically, this argument 'thr' despite the explanation shown in the source code: The dataset is filtered for proteins that have a maximum of 'thr' missing values in at least one condition.
Can you be more specific, please?
Because if I run thr=0, I should see the same number of proteins for all my conditions, right? I didn't see that. Thanks!