Closed mindreframer closed 1 year ago
in the next version (when i or someone finally manages to make it work), the preprocessors will be configurable per file type
@phiresky OMG, that would be awesome! Any ideas, how the configuration would look like? E.g when I'm overriding Docx preprocessor, how would I specify it?
in the next version (when i or someone finally manages to make it work), the preprocessors will be configurable per file type
Such feature would completely eradicate the embarrassing freezing issue of searching through epub folders.
Any idea of the delivery time for the next release ?
Thanks for the great tool btw !
Starting with 1.0.0, it's possible to add custom adapters via the config file. If someone has a good suggestion for a file type please post it in show-your-adapter
First - what an awesome project! It really makes searching of huge document libraries possible.
Currently I have a lot of issues with Epub parsing, pandoc hangs forever with 100% CPU when parsing some EPUB files, sometimes bigger, but sometimes also on smaller ones. Currently I don't have a good workaround for this.
I tried parsing those files that cause issues with https://github.com/kevinboone/epub2txt2 and it returns the content instantly. Also, judging by the amount of issues here with EPUB parsing, this could be a a good solution for many other issues.
Please consider allowing to use epub2txt2 as backend for EPUB extraction.
Thanks!