Add additional quality metrics

tlnagy commented 8 years ago

Evaluating model performance solely based on AUROCs might not be ideal because they aren't well known outside of the biostats/ML community and our simulated data (like real life examples) are skewed towards having more negatives.

[x] add support for simple venn diagram-like method, i.e. % correct of top 25 pos/neg hits
[x] AUPRC, precision-recall might be better than AUROC since the data is skewed towards negatives

martinkampmann commented 8 years ago

Hm, I haven't thought about precision-recall that way - that sounds interesting to try. However between the two the more intuitive "venn" overlap I think may be the higher priority.

tlnagy commented 8 years ago

Relevant literature on PR curves:

The Relationship Between Precision-Recall and ROC Curves http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf
Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals http://pages.cs.wisc.edu/~boyd/aucpr_final.pdf

tlnagy commented 8 years ago

The second paper discusses several estimators for the AUPRC and identify the following point estimators as being the most robust

Lower trapezoid
Average precision
Interpolated median

I decided to implement the first one and hacked up a version of their Figure 3:

include("load.jl")
using Distributions
using Gadfly
π = 0.1
test_dists = Array[
     [Normal(0, 1), Normal(1,1)],
     [Beta(2, 5), Beta(5, 2)],
     [Uniform(0, 1), Uniform(0.5, 1.5)]
]
# true precision, recall functions
recall(xs, Y) = 1-cdf(Y, xs)
precision(xs, π, X, Y) = π*recall(xs, Y)./(π*recall(xs, Y) + (1-π)*(1-cdf(X, xs)))

xs = linspace(-10, 10, 1000)
names = ["binormal", "bibeta", "offset uniform"]

plots = Plot[]
for (name, dists) in zip(names, test_dists)
    X = dists[1]
    Y = dists[2]
    classes = [:b, :a]

    layers = []
    push!(layers, layer(x=recall(xs, Y), y=precision(xs, π, X, Y), 
    Geom.line, Theme(line_width=2pt)))
    for i in 1:10
        cat = rand(Categorical([1-π, π]), 500)
        scores = map(rand, dists[cat])    
        _auprc, p, r = auprc(scores, classes[cat], Set([:a]))
        push!(layers, layer(x=r, y=p,Geom.line, 
        Theme(default_color=colorant"#cccccc", highlight_width=0pt)))
    end
    push!(plots, plot(layers..., Coord.cartesian(fixed=true), 
    Guide.ylabel("precision"), Guide.xlabel("recall"),
    Guide.title(name), Guide.yticks(ticks=[0.0, 0.5, 1.0])))
end
draw(SVG(30cm, 10cm), hstack(plots))

test_prc

this shows my sampled PR curves (gray) and the true PR curve (blue). This doesn't test my AUPRC implementation, but I'm going to measure the bias in my implementation of the lower trapezoid estimator like in Figure 4 from the paper

tlnagy / Crispulator.jl

Add additional quality metrics #25