Which SIST2 component is your Feature Request related to?
Scan
Is your feature request related to a problem? Please describe.
Indexed file formats are limited and I presume it is a tedious programming task to extend more formats and maintain them.
Currently I'm using SIST2 mainly to index PDFs. It's really great and I would like to see other file formats like mbox, too.
What would you like to see happen?
Please support the use of an Apache Tika Server http://tika.apache.org. The server itself could be integrated easily within the docker-compose file (see docker hub). Using the server's REST API would extract metadata from over a thousand different file types with permanent maintenance by Apache, OCR integration ...
Unfortunately I found no C library for the API, but e.g. Tika Python which maybe could be used as a template.
Hi, for performance reasons it's not feasible to use tika. Other than mbox, what other formats are missing? Feel free to open an issue for each missing format
Which SIST2 component is your Feature Request related to? Scan
Is your feature request related to a problem? Please describe. Indexed file formats are limited and I presume it is a tedious programming task to extend more formats and maintain them. Currently I'm using SIST2 mainly to index PDFs. It's really great and I would like to see other file formats like mbox, too.
What would you like to see happen? Please support the use of an Apache Tika Server http://tika.apache.org. The server itself could be integrated easily within the docker-compose file (see docker hub). Using the server's REST API would extract metadata from over a thousand different file types with permanent maintenance by Apache, OCR integration ... Unfortunately I found no C library for the API, but e.g. Tika Python which maybe could be used as a template.
Thanks in advance for considering!