Crash on short input; len(technical_counts) == 0 after filtering

In combo_basic.py:

    if len(technical_counts) == 0:
        return pd.Series()

    order = sorted(
        list(technical_counts.keys()), key=TermExtraction.word_length, reverse=True
    )

    if not have_single_word:
        order = list(filter(lambda s: TermExtraction.word_length(s) > 1, order))

    technical_counts = technical_counts[order]

    df = pd.DataFrame(
        {
            "xlogx_score": technical_counts.reset_index()
            .apply(
                lambda s: math.log(TermExtraction.word_length(s["index"])) * s[0],
                axis=1,
            )
            .values,
            "times_subset": 0,
            "times_superset": 0,
        },
        index=technical_counts.index,
    )

The call to pd.DataFrame() can fail if technical_counts is empty after technical_counts = technical_counts[order]. This can be avoided with a second check for an empty Series, e.g.:

    technical_counts = technical_counts[order]

    if len(technical_counts) == 0:
        return pd.Series()

Minimal working example:

import spacy
from pyate.term_extraction_pipeline import TermExtractionPipeline
nlp = spacy.load("en")
nlp.add_pipe(TermExtractionPipeline())
text = "This sentence is short."
nlp(text)

kevinlu1248 / pyate

Crash on short input; len(technical_counts) == 0 after filtering #1