knime-mpicbg / HCS-Tools

HCS-Tools
BSD 3-Clause "New" or "Revised" License
10 stars 3 forks source link

Outlier removal - why using 85 percentile #109

Open imagejan opened 6 years ago

imagejan commented 6 years ago

The node documentation for Outlier Removal says:

https://github.com/knime-mpicbg/HCS-Tools/blob/230d5aa19f0dc4109bcf7e2b4e8917cab157ea7c/de.mpicbg.knime.hcs.base/src/de/mpicbg/knime/hcs/base/nodes/preproc/OutlierRemovalFactory.xml#L40-L42

which is in line with the source code:

https://github.com/knime-mpicbg/HCS-Tools/blob/230d5aa19f0dc4109bcf7e2b4e8917cab157ea7c/de.mpicbg.knime.hcs.base/src/de/mpicbg/knime/hcs/base/nodes/preproc/OutlierRemoval.java#L139-L140

But what's the reason for using 85 for the upper (instead of 75, the upper quartile), when using at the same time 25 (the lower quartile) for the lower limit?

Apparently, someone else had this question as well :smile: :

https://github.com/knime-mpicbg/HCS-Tools/blob/230d5aa19f0dc4109bcf7e2b4e8917cab157ea7c/de.mpicbg.knime.hcs.base/src/de/mpicbg/knime/hcs/base/nodes/preproc/OutlierFilterModel.java#L96

@Meyenhofer any comments on this?

niederle commented 6 years ago

True, it looks like a typo and it seems I already started to implement a new version of the NodeModel of this node (some years ago...) and was wondering about it too.

imagejan commented 6 years ago

For others stumbling upon this: you can easily get a standard boxplot outlier removal (i.e. 1.5-times inter-quartile-range) using an R Snippet node with the following code (without grouping though...):

x = knime.in$"myColumn"
result <- x[!x %in% boxplot.stats(x)$out]
knime.out <- data.frame(result)

See also: https://stackoverflow.com/a/4937343/1919049

fmeyenhofer commented 6 years ago

given that the box should include 50% of all the samples 85 instead of 75 must have been a typo. d.