Closed Wirg closed 4 years ago
Merging #1572 into master will increase coverage by
0.06%
. The diff coverage is92.30%
.
@@ Coverage Diff @@
## master #1572 +/- ##
==========================================
+ Coverage 97.13% 97.19% +0.06%
==========================================
Files 56 68 +12
Lines 2091 2137 +46
Branches 342 343 +1
==========================================
+ Hits 2031 2077 +46
Misses 31 31
Partials 29 29
Impacted Files | Coverage Δ | |
---|---|---|
...el/classification/training/loggers/checkpointer.py | 96.90% <ø> (ø) |
|
snorkel/labeling/lf/core.py | 100.00% <ø> (ø) |
|
snorkel/preprocess/core.py | 100.00% <ø> (ø) |
|
snorkel/slicing/sf/nlp.py | 100.00% <ø> (ø) |
|
snorkel/classification/training/trainer.py | 89.83% <50.00%> (ø) |
|
snorkel/labeling/model/label_model.py | 95.54% <50.00%> (ø) |
|
snorkel/labeling/lf/nlp.py | 100.00% <100.00%> (ø) |
|
snorkel/map/core.py | 100.00% <100.00%> (ø) |
|
snorkel/preprocess/nlp.py | 86.66% <100.00%> (ø) |
|
snorkel/slicing/utils.py | 94.80% <100.00%> (+0.13%) |
:arrow_up: |
... and 14 more |
@henryre I just noticed it won't work in the expected way.
The snorkel pattern is to return the x_mapped so the cache will change the data point.
def test_decorator_mapper_memoized_use_memoize_key(self) -> None:
square_hit_tracker = SquareHitTracker()
@lambda_mapper(memoize=True, memoize_key=lambda x: x.num)
def square(x: DataPoint) -> DataPoint:
x.num_squared = square_hit_tracker(x.num)
return x
x8 = self._get_x()
x8_mapped = square(x8)
assert x8_mapped is not None
self.assertEqual(x8_mapped.num_squared, 64)
self.assertEqual(square_hit_tracker.n_hits, 1)
x8_with_another_text = self._get_x(text="Henry is still having fun")
x8_with_another_text_mapped = square(x8_with_another_text)
assert x8_with_another_text_mapped is not None
self.assertEqual(x8_with_another_text_mapped.num_squared, 64)
self.assertEqual(square_hit_tracker.n_hits, 1)
# This should fail :/
self.assertEqual(x8_with_another_text_mapped, x8_mapped)
Hi @Wirg, thanks for putting this up! Based on the example you put up, the expected behavior would be square(x8_with_another_text) == x8_mapped
since the hashing function was (intentionally) "poorly chosen" in the test. Are you saying self.assertEqual(x8_with_another_text_mapped, x8_mapped)
will trigger an AssertionError in the current implementation?
Hi @henryre ,
I hope you're going well.
Small bump on this PR.
Hi @henryre ,
Another bump for this pr.
What do you want me to do ?
Do you have some change ? Do you want to give up on this feature ?
Hi @Wirg, sorry for the delay here and thanks for the reminder! Taking a look today!
@henryre
I finally changed the test. I am not fully satisfied. If tomorrow someone changes :
get_hashable
to support pandas dataframememoize_key
not to be usedThe test won't fail.
@Wirg good thinking! You could add an additional field in the test called not_used
and have different values for the two data points
@henryre so I added a not_used
int.
EDIT : nevermind. I rebased and those were fixed. Waiting for your review.
I am still encountering new test failures. I fixed F541 (f-string used with no parameters). I am facing a typing failure due to
torch.nn.Linear
usage insnorkel/slicing/utils
. I am not sure what are my steps on this ? Is this already fixed and I should rebase ?
@henryre small bump
I don't know what I should be doing regarding codecov. Am I expected to add more tests. If yes, where ?
@Wirg looks like codecov was being a bit temperamental, will go ahead and merge in. Thanks for your hard work here!
Description of proposed changes
Feature
Currently, there is no way to decide the key to be memoized when using preprocessor(memoize=True). This leads to 2 issues :
Result
Implementation
Add a
memoize_key : Optional[HashingFunction]
to theBaseMapper
, if provided and notNone
, it will be used instead ofget_hashable
to define the hash of the input.memoize_key
has been made accessible to the different functions providingmemoize
api.Related issue(s)
https://github.com/snorkel-team/snorkel/issues/1561
Test plan
Checklist
Need help on these? Just ask!
tox -e complex
and/ortox -e spark
if appropriate.@henryre , I will be waiting for your review. I ran
tox -e doc
, but it did not produce any change and I have a bunch ofWARNING: toctree contains reference to nonexisting document
, is it normal ? By the way, how would you like to discuss my use case further ?