This PR integrates SpaCy's powerful NLP capabilities into Langflow through a comprehensive set of components, enabling advanced text processing and analysis workflows.
🎯 Core Components
Language Model Management
SpacyModel
Base component for SpaCy language models
Supports 20+ languages including English, German, French, Spanish, etc.
Automatic model download and initialization
Multiple model sizes (sm, md, lg) per language
Configurable entity merging
Pipeline component management
Entity Processing
EntityRecognizer
Named Entity Recognition (NER)
Built-in entity types (PERSON, ORG, DATE, etc.)
Entity context extraction
Sentence-level entity tracking
Confidence scoring
Detailed entity metadata
EntityRuler
Pattern-based entity recognition
Custom rule definition
Regex pattern support
Phrase pattern matching
Entity pattern priorities
Rule-based entity labeling
Text Analysis
DependencyMatcher
Syntactic pattern matching
Relationship extraction
Subject-Verb-Object detection
Custom dependency rules
Active/Passive voice identification
Complex pattern definitions
TextCategorizer
Single-label classification (textcat)
Multi-label classification (textcat_multilabel)
Configurable threshold settings
Confidence scoring
Custom category management
Binary and multi-class support
Text Processing
Lemmatizer
Rule-based and lookup lemmatization
Custom abbreviation handling
Multiple lemmatization modes
Whitespace preservation
Part-of-speech aware lemmatization
Custom dictionary support
Sentencizer
Advanced sentence segmentation
RAG-optimized chunking
Automatic abbreviation detection
Custom punctuation rules
Quote-aware segmentation
Multi-language support
Tagger
Part-of-speech tagging (POS)
Fine-grained tags (TAG)
Dependency parsing (DEP)
Morphological analysis
Custom tag sets
Detailed token attributes
🔍 Example Flows
Lemmatizer Flow
Test text:
The researchers were running multiple groundbreaking studies while the automated
systems continuously processed the incoming data. Children's toys scattered
across the floor were quickly gathered by the cleaning robots, which had been
programmed to recognize various objects.
SpaCy Components Integration
This PR integrates SpaCy's powerful NLP capabilities into Langflow through a comprehensive set of components, enabling advanced text processing and analysis workflows.
🎯 Core Components
Language Model Management
Entity Processing
EntityRecognizer
EntityRuler
Text Analysis
DependencyMatcher
TextCategorizer
Text Processing
Lemmatizer
Sentencizer
Tagger
🔍 Example Flows
Lemmatizer Flow
Test text:
Download Lemmatizer Flow JSON
Dependency Matcher Flow
Pattern Example:
Download Dependency Matcher Flow JSON
Sentencizer Flow
Features:
Text Categorizer Flow
Supports:
Tagger Flow
Tag types:
Entity Ruler Flow
Pattern types:
Entity Recognizer Flow
Entity types:
🛠️ Technical Details
Implementation Features
📊 Sample Data
🔗 Related Resources
👥 Contributors
📃 License