snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Faster symmetry breaking #1502

Closed brahmaneya closed 5 years ago

brahmaneya commented 5 years ago

Description of proposed changes

Use the Munkres algorithm instead of brute force search to find the optimal permutation for breaking column symmetry. Our objective for symmetry breaking is to maximize the total probability of LFs being accurate (summed over LFs). We optimize this over combinations of permutations of columns corresponding to labels with the same prior probability. This is equivalent to optimizing the trace of the product of the summed probabilities matrix with the permutation matrix, which can be done in O(n^3) time.

Fixes # (issue)

1486

Test plan

Modified existing symmetry breaking test.

Checklist

Need help on these? Just ask!

codecov[bot] commented 5 years ago

Codecov Report

Merging #1502 into master will increase coverage by <.01%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1502      +/-   ##
==========================================
+ Coverage    97.6%   97.61%   +<.01%     
==========================================
  Files          55       55              
  Lines        2049     2056       +7     
  Branches      335      339       +4     
==========================================
+ Hits         2000     2007       +7     
  Misses         22       22              
  Partials       27       27
Impacted Files Coverage Δ
snorkel/labeling/model/label_model.py 95.92% <100%> (+0.09%) :arrow_up: