if len(technical_counts) == 0:
return pd.Series()
order = sorted(
list(technical_counts.keys()), key=TermExtraction.word_length, reverse=True
)
if not have_single_word:
order = list(filter(lambda s: TermExtraction.word_length(s) > 1, order))
technical_counts = technical_counts[order]
df = pd.DataFrame(
{
"xlogx_score": technical_counts.reset_index()
.apply(
lambda s: math.log(TermExtraction.word_length(s["index"])) * s[0],
axis=1,
)
.values,
"times_subset": 0,
"times_superset": 0,
},
index=technical_counts.index,
)
The call to pd.DataFrame() can fail if technical_counts is empty after technical_counts = technical_counts[order]. This can be avoided with a second check for an empty Series, e.g.:
technical_counts = technical_counts[order]
if len(technical_counts) == 0:
return pd.Series()
Minimal working example:
import spacy
from pyate.term_extraction_pipeline import TermExtractionPipeline
nlp = spacy.load("en")
nlp.add_pipe(TermExtractionPipeline())
text = "This sentence is short."
nlp(text)
In combo_basic.py:
The call to
pd.DataFrame()
can fail iftechnical_counts
is empty aftertechnical_counts = technical_counts[order]
. This can be avoided with a second check for an empty Series, e.g.:Minimal working example: