Closed wdemis closed 6 years ago
This version uses Tika 1.7, PdfBox 2.0.8, Nifi 1.5.0, and has decent unit testing. I did notice that your example flow is using the NLP stuff you were working on, but THIS branch doesn't include the NLP stuff. Therefore I tested the processor with my own Nifi flow, but you might want to update the template with a ExtractTextProcessor-only version of the flow.
cool thanks will do. I am looking at Tika for some of the cool stuff with vision
Looking at integrating some interesting things here: https://wiki.apache.org/tika/
…updated to tika 1.7