Open tokee opened 3 years ago
I think this likely relates to this issue: https://github.com/openpreserve/nanite/pull/36
Unfortunately, the pull was full of whitespace changes and I couldn't work out what was happening. I'll have to try and fix it up.
Hm, also https://github.com/openpreserve/nanite/pull/40 and this part of the code seems to be a bit of a mess as those two pulls were a bit out of sync, so I'll try to tidy up.
Well, that was messy, but I think the Nanite code is better now. Just released 1.4.1-97 and will update this project when it becomes available.
Actually lets leave this open until we've proved the Nanite update resolved the issue.
Note that Tika < 1.25 has also been reported as generating a lot of tmp files (https://issues.apache.org/jira/browse/TIKA-3203) so that might also be the issue. I've updated to 1.28.5 and I'm looking at getting to Tika 2.7.
It seems that calling
warc-indexer
with thousands of WARC-files causes thetmp
folder to fill up (maybe due to DROID temporary files). It should possible to clean up underway.