PipelineIE is an Information Extraction Pipeline primarily based on spaCy that lets you extract information from free text and provides the flexibility to run general to domain specific pipeline like the biomedical domain for information extraction.
Currently the pipeline extracts information in the form of triplets and consists of Coreference Resolution (Stanford CoreNLP / neuralcoref) >> Sentence Simplification that decomposes complex sentences to simple sentences >> Entity Linking (spaCy / ScispaCy / custom spaCy model) >> Triplet Extraction (Currently Subject - Verb - Object Rule using textaCy).
How does it help? / What problem does it solve?
Install neuralcoref from source as mentioned below (referenced from their github repo)
venv .env
source .env/bin/activate
git clone https://github.com/huggingface/neuralcoref.git
cd neuralcoref
pip install -r requirements.txt
pip install -e .
Optional: Download and unzip CoreNLP 4.2.0 if CoreNLP has to be used for coreference resolution.
Install PipelineIE
git clone https://github.com/vj1494/PipelineIE.git
cd PipelineIE
pip install -r requirements.txt
pip install -e .
Biomedical Pipeline
from pipeline_ie.pipeline_ie import PipelineIE
text = "Co-culture of NK cells with transfected EC enhanced E-selectin, IL-8, and NF-kappaB-dependent promoter activity."
#Biomedical PipelineIE
#Default Biomedical Pipeline uses ScispaCy en_core_sci_lg model
#Same model is used for neuralcoref, entity linkage and triple extraction
#pipeline_ie="default" uses spacy en model
#Sentence Simplification is set as True by default. In order to disable it pass sentence_simplify=False
pie = PipelineIE(text, pipeline="biomedical")
#Returns a dataframe
df = pie.pipeline_triplet()
Please refer to the example for Additional Usage.
Sentence Simplification - (https://github.com/freyamehta99/Sentence-Simplification)