openpreserve / nanite

Nanite - a friendly swarm of format-identifying robots.
openplanets.github.io/nanite/
15 stars 13 forks source link

Changes to allow Droid/Tika&Nanite to all test the same file #4

Closed willp-bl closed 11 years ago

willp-bl commented 11 years ago

Modify FormatProfilerMapper so it can use Tika, DroidDetector and Nanite in the same map on the same InputStream (for InputStream<2GB). Previously InputStream was being emptied and there was nothing left for the other detectors.

Various build fixes

Add a method to TikaDeepIdentifier to use an InputStream&Metadata instead of a byte array

Additionally output file extension in the map that can be turned off easily

[Travis] also compile/test nanite-hadoop

anjackson commented 11 years ago

Grand, thanks for all your work. Glad to see the Hadoop version going again.