Closed talolard closed 3 years ago
Merging #1629 (d579832) into master (ed77718) will decrease coverage by
0.94%
. The diff coverage is82.14%
.
@@ Coverage Diff @@
## master #1629 +/- ##
==========================================
- Coverage 97.21% 96.26% -0.95%
==========================================
Files 68 72 +4
Lines 2151 2276 +125
Branches 345 358 +13
==========================================
+ Hits 2091 2191 +100
- Misses 31 52 +21
- Partials 29 33 +4
Impacted Files | Coverage Δ | |
---|---|---|
...abel_model/sparse_example_eventlist_label_model.py | 47.82% <47.82%> (ø) |
|
...parse_label_model/sparse_event_pair_label_model.py | 61.53% <61.53%> (ø) |
|
snorkel/labeling/model/label_model.py | 94.58% <89.18%> (-0.97%) |
:arrow_down: |
...odel/sparse_label_model/base_sparse_label_model.py | 91.30% <91.30%> (ø) |
|
...l/sparse_label_model/sparse_label_model_helpers.py | 100.00% <100.00%> (ø) |
I think the coverage tool isn't picking up on some of the tests. Static methods in
sparse_example_eventlist_label_model.py
and sparse_event_pair_label_model.py
get tested explicitly in the two tests I marked with @pytest.mark.complex
This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Disclaimer
This PR isn't done. It does what it's supposed to do and has tests, but style and code cleanliness might not be there.
I'm not totally confidant this implementation is a fit. I'd appreciate if someone could take a look and let me know if I'm on track before I polish this. Maybe @bhancock8 who replied to the original issue ?
Description of proposed changes
Adds support for training and inference with sparse matrices.
This PR adds a few convenience functions to help the user work with sparse matrices representations of L_ind / or the objective matrix (do either have a formal name ? ).
I presume most users will call 'train_model_from_sparse_event_cooccurence', which takes a list of tuples representing L_ind indices and value (which is always 1), populate a sparse matrix and runs training.
train_model_from_sparse_event_cooccurence calls 'train_model_from_known_objective' which gets a dense numpy representation of O and trains. When I use Snorkel I call this function and calculate O elsewhere, it's faster.
Internally, there is some refactoring in LabelModel to support
train_model_from_known_objective
, constants are set differently and the tree and clique data calculations are moved a little.Related issue(s)
Fixes #1625
Test plan
I wrote tests in test_sparse_data_helpers. Basically the tests create an L matrix in standard format, and then compare the output of normal Snorkel to Sparse Snorkel.
Checklist
Need help on these? Just ask!
tox -e complex
and/ortox -e spark
if appropriate.