nextcloud / fulltextsearch

🔍 Core of the full-text search framework for Nextcloud
https://apps.nextcloud.com/apps/fulltextsearch
GNU Affero General Public License v3.0
216 stars 51 forks source link

image files and external files are indexed although both are set to not get indexed. #872

Open ferdiga opened 2 months ago

ferdiga commented 2 months ago

especially indexing of images is extremely time consuming and must be excluded.

BTW there is currently no option to configure and others

"files_image": "0",
"files_audio": "0",
- Content Providers:
Deck 1.13.1
[]
Files 29.0.1
{
    "files_local": "1",
    "files_external": "2",
    "files_group_folders": "1",
    "files_encrypted": "0",
    "files_federated": "0",
    "files_size": "1",
    "files_pdf": "1",
    "files_office": "1",
    "files_image": "0",
    "files_audio": "0",
    "files_chunk_size": "2",
    "files_fulltextsearch_tesseract": {
        "version": "27.0.0",
        "enabled": "1",
        "psm": "4",
        "lang": "eng,deu,fra",
        "pdf": "1",
        "pdf_limit": "0"
    }
}
ferdiga commented 2 weeks ago

just to illustrate - the indexing finished after 2 1/2 month with an error.

Image

Piefje01 commented 2 weeks ago

Do you scan images?

ferdiga commented 2 weeks ago

apparently - yes - although the settings should exclude imaged - as shown above "files_image": "0", "files_audio": "0",

IMO EXIF Data could be scanned at low cost

Piefje01 commented 2 weeks ago

Some how it did not work: Ocr within a jpg for example . I have a lot of photo's with sample numbers but it will no be scanned.