robust z-score - Githubissues

tischi commented 4 years ago

@imagirom @Steffen-Wolf @metavibor

Another score that is often used in such assays is the robust z-score:

( median( infected ) - median( not_infected ) ) / median_absolute_devitation( not_infected )

This has several advantages:

It will not be affected by offsets (background), because both nominator and denominator are differences.
It will not be affected by a multiplicative factor (amount of serum, microscope settings, etc.), because it is a ratio.
It is robust to outlier cells, because everything is median based.

It has however one disadvantage:

The score strongly depends on the mad ( median_absolute_deviation = variation of intensity) of the not infected cells. That is, if all the not infected cells for some reason have a very similar intensity we will get very high scores.

Above mentioned disadvantage is why I generally don't like it very much, sometimes one can in fact almost divide by zero and the score explodes. I don't find this behaviour biologically very meaningful, but otherwise this score does have a lot of advantages.

metavibor commented 4 years ago

sounds good, why not including this as another parameter that is computed.

wolny commented 4 years ago

also wouldn't hurt to include it as another feature to positive/negative sample classifier.

imagirom commented 4 years ago

Added in https://github.com/hci-unihd/batchlib/commit/373ad1c971a2eea4b64ba008ae9129f80072d93f, under column names robust_z_score_sums and robust_z_score_means in image- and well tables.

hci-unihd / antibodies-analysis-issues

robust z-score #39