Closed tlnagy closed 8 years ago
Hm, I haven't thought about precision-recall that way - that sounds interesting to try. However between the two the more intuitive "venn" overlap I think may be the higher priority.
Relevant literature on PR curves:
The second paper discusses several estimators for the AUPRC and identify the following point estimators as being the most robust
I decided to implement the first one and hacked up a version of their Figure 3:
include("load.jl")
using Distributions
using Gadfly
π = 0.1
test_dists = Array[
[Normal(0, 1), Normal(1,1)],
[Beta(2, 5), Beta(5, 2)],
[Uniform(0, 1), Uniform(0.5, 1.5)]
]
# true precision, recall functions
recall(xs, Y) = 1-cdf(Y, xs)
precision(xs, π, X, Y) = π*recall(xs, Y)./(π*recall(xs, Y) + (1-π)*(1-cdf(X, xs)))
xs = linspace(-10, 10, 1000)
names = ["binormal", "bibeta", "offset uniform"]
plots = Plot[]
for (name, dists) in zip(names, test_dists)
X = dists[1]
Y = dists[2]
classes = [:b, :a]
layers = []
push!(layers, layer(x=recall(xs, Y), y=precision(xs, π, X, Y),
Geom.line, Theme(line_width=2pt)))
for i in 1:10
cat = rand(Categorical([1-π, π]), 500)
scores = map(rand, dists[cat])
_auprc, p, r = auprc(scores, classes[cat], Set([:a]))
push!(layers, layer(x=r, y=p,Geom.line,
Theme(default_color=colorant"#cccccc", highlight_width=0pt)))
end
push!(plots, plot(layers..., Coord.cartesian(fixed=true),
Guide.ylabel("precision"), Guide.xlabel("recall"),
Guide.title(name), Guide.yticks(ticks=[0.0, 0.5, 1.0])))
end
draw(SVG(30cm, 10cm), hstack(plots))
this shows my sampled PR curves (gray) and the true PR curve (blue). This doesn't test my AUPRC implementation, but I'm going to measure the bias in my implementation of the lower trapezoid estimator like in Figure 4 from the paper
Evaluating model performance solely based on AUROCs might not be ideal because they aren't well known outside of the biostats/ML community and our simulated data (like real life examples) are skewed towards having more negatives.