Closed CarloLepelaars closed 1 year ago
Feel free to make a PR. It might make sense to have the actual reference to the torch object to be self._device
if that makes the __repr__
look better.
Cool! I made PR #34 which solves this and adds some tests for SentenceEncoder
. Defining self._device
additionally does not seem necessary as __repr__
output is already clean.
As for testing text embedders it looks like Sense2VecEncoder
is a lot harder to test since it depends on loading a file from disk. Any ideas to test that or do you think its not necessary?
There's no harm in adding 'self.path' I think. Feel free to make that PR as well!
There's no harm in adding 'self.path' I think. Feel free to make that PR as well!
Cool! Created #35 for the Sense2Vec attribute fix.
The
device
argument inSentenceEncoder
is not defined as an attribute. This leads to bugs when using it with sklearn. I encountered attribute errors when trying to print out aPipeline
representation that hasSentenceEncoder
as a component.Should be easy to fix by just adding
self.device
inSentenceEncoder.__init__
. We can consider adding tests for text encoders so we can catch these errors beforehand.The scikit-learn development docs make it clear every argument should be defined as an attribute:
Error message:
AttributeError: 'SentenceEncoder' object has no attribute 'device'
.Reproduction: Python 3.8 with
embetter = "^0.2.2"
Fix:
Add
self.device
onSentenceEncoder