Open nipunsadvilkar opened 2 years ago
Merging #114 (e07808a) into master (5905f13) will decrease coverage by
0.08%
. The diff coverage is50.00%
.
@@ Coverage Diff @@
## master #114 +/- ##
==========================================
- Coverage 98.43% 98.35% -0.09%
==========================================
Files 38 39 +1
Lines 1150 1153 +3
==========================================
+ Hits 1132 1134 +2
- Misses 18 19 +1
Flag | Coverage Δ | |
---|---|---|
unittests | 98.35% <50.00%> (-0.09%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
pysbd/utils.py | 73.33% <42.85%> (-2.53%) |
:arrow_down: |
pysbd/about.py | 100.00% <100.00%> (ø) |
|
pysbd/__init__.py | 100.00% <0.00%> (ø) |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
Are you still working on this? Otherwise I could have a look.
Hey @davidberenstein1957, sure you can take a look at it.
But I'm not sure what would be best way since I want to keep pysbd
lightweight and to support psybd with spacy v3 with Language.factory
is needed and which would make me add spacy
as dependency.
Let me know if you happen to work on the recommendations suggested by @rmitsch above.
here would be an option to update the factory method and not require spacey as a hard requirement to pysbd.
from typing import Any
try:
from spacy.language import Language
langfac = Language.factory
except ImportError:
def langfac(*args:Any,**kwargs:Any):
def decorator(function:Any):
def wrapper(*args:Any, **kwargs:Any):
pass
return wrapper
return decorator
@langfac(name="pysbd",default_config={"language": 'en'})
class PySBDFactory(object):
"""pysbd as a spacy component through entrypoints"""
def __init__(self, nlp, name,language='en'):
self.nlp = nlp
self.name = name
self.seg = pysbd.Segmenter(language=language, clean=False,
char_span=True)
def __call__(self, doc):
sents_char_spans = self.seg.segment(doc.text_with_ws)
start_token_ids = [sent.start for sent in sents_char_spans]
for token in doc:
token.is_sent_start = (True if token.idx
in start_token_ids else False)
return doc
`
PySBD component using Language.factory