Open jkoopmann opened 6 years ago
I will add a limit to the filesize and some ignore on extension also
I would suggest ignoring all hidden files as a default. There is also Synology folders like @eaDir that shouldn't be indexed which can't be ignored based on file extension.
Adding a .noindex file will ignore files and subfolders
true. However typical users will most likely not pay attention to this and create ".noindex" files will they? :-)
this is however the simplest way to tell the app not to index a full directory
The .noindex files work for one-off, static folders, but a lot of hidden folders are generated on the fly as files and folders are created and removed.
In what situation would you want to index hidden (dot-)files?
Yes, we totally agree on that point, I was not clear enough.
I will add an option to enable indexing/searching within hidden files
Thanks @daita
The .noindex file excludes the entire folder and the files it contains. But how can I exclude files by pattern? I will exclude temporary Word files like "~*.docx" from indexing because this files throws always 'java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]'
Do my users have to create the .noindex file in the Nextcloud WebGUI or can I create it as server admin on the filesystem, without it being transparent to the users?
Hi,
my initial index keeps running into memory errors. Just noticed it always happens on a .m2ts file. This is several GB big and I suspect if fulltextsearch passes it to elasticsearch things go crazy. Moreover indexing videofiles in general might not be a good idea.
How can I tell the plugin or elasticsearch to ignore certain extensions, filesizes or paths?
On another note: Do I need to do anything special to have PDFs,TIFFs etc OCRed besides having tesseract installed?
Regards, JP