stephbuon / posextract

Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.
MIT License
3 stars 0 forks source link

posextract

posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See [our article]() for more. You can also download posextract for pypi with pip.

Usage

Required Paramters:

Optional Paramters:

Examples

Interactive:

Extract grammatical triples.

from posextract import grammatical_triples

triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])

for triple in triples:
    print(triple)

# Output: Landlords exercise oppression, soldiers were ill

Extract grammatical triples using different options from default:

from posextract.util import TripleExtractorOptions

triples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))

Or extract adjectives and the nouns they modify.

from posextract import adj_noun_pairs

adj_noun = adj_noun_pairs.extract()

Or extract subjects and their verbs.

from posextract import subj_verb_pairs

subj_verb = subj_verb_pairs.extract()

Over CLI:

posextract can extract grammatical triples from text:

python -m posextract.extract_triples "Landlords may exercise oppression." output.csv

# Output: Landlords exercise oppression

posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:

python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally, soldiers were ill 
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally ill

If provided a .csv file:

python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv

For More Information...

... see our Wiki: