tspannhw / nifi-extracttext-processor

Apache NiFi Custom Processor Extracting Text From Files with Apache Tika
Apache License 2.0
35 stars 29 forks source link

Added significant unit testing, fleshed out max output property, and … #2

Closed wdemis closed 6 years ago

wdemis commented 6 years ago

…updated to tika 1.7

wdemis commented 6 years ago

This version uses Tika 1.7, PdfBox 2.0.8, Nifi 1.5.0, and has decent unit testing. I did notice that your example flow is using the NLP stuff you were working on, but THIS branch doesn't include the NLP stuff. Therefore I tested the processor with my own Nifi flow, but you might want to update the template with a ExtractTextProcessor-only version of the flow.

tspannhw commented 6 years ago

cool thanks will do. I am looking at Tika for some of the cool stuff with vision

tspannhw commented 6 years ago

Looking at integrating some interesting things here: https://wiki.apache.org/tika/