szczyglis-dev / py-gpt

Desktop AI Assistant powered by o1, GPT-4, GPT-4 Vision, Gemini, Claude, Llama 3, Bielik, DALL-E, Langchain, Llama-index, chat, vision, voice control, image generation and analysis, agents, command execution, file upload/download, speech synthesis and recognition, access to Web, memory, presets, assistants, plugins, and more. Linux, Windows, Mac.
https://pygpt.net
Other
648 stars 126 forks source link

File extensions that are on the exclude list do not appear to be excluded from indexing. #31

Closed oleksii-honchar closed 8 months ago

oleksii-honchar commented 8 months ago

py-gpt version: 2.1.18

Context

3g2,3gp,7z,a,aac,aiff,alac,apk,apk,apng,app,ar,avif,bin,bz2,cab,class,deb,deb,dll,dmg,dmg,drv,dsd,dylib,dylib,ear,egg,elf,esd,exe,flac,flv,gz,heic,heif,ico,img,iso,jar,ko,lib,lz,lz4,m2v,mpc,msi,nrg,o,ogg,ogv,pcm,pkg,pkg,psd,pyc,rar,rpm,rpm,so,so,svg,swm,sys,tar,vdi,vhd,vhdx,vmdk,vob,war,whl,wim,wma,wmv,xz,zip,zst,
png,jpg, jpeg

Actual behavior

When indexing folder with ChromaDB it show logs for processing png files.

Expected behavior

Png file not being indexed

szczyglis-dev commented 8 months ago

This is because the list excludes only those extensions for which there are no registered data loaders (and there is a built-in data loader for png).

From version 2.1.19, it is possible to force the exclude even if a data loader for extension exists - new option has been added for this: Settings -> Llama-index -> Force exclude.