szczyglis-dev / py-gpt

Desktop AI Assistant powered by GPT-4, GPT-4 Vision, GPT-3.5, DALL-E 3, Langchain, Llama-index, chat, vision, voice control, image generation and analysis, autonomous agents, code and command execution, file upload and download, speech synthesis and recognition, access to Web, memory, prompt presets, plugins, assistants & more. Linux, Windows, Mac.
https://pygpt.net
MIT License
449 stars 92 forks source link

question: how to fix indexed file types? #32

Closed oleksii-honchar closed 3 months ago

oleksii-honchar commented 3 months ago

pygpt version: 2.1.18

I noticed that in the "FILES" tab and chat, when the LlamaIndex source is referenced, it provides the wrong file type, for example, "file_type: video/mp2t."

From what I've read on internet about RAG this can affect q&a quality, when for document used wrong meta information. Is it possible to fix types for source code files?

image
szczyglis-dev commented 3 months ago

From version 2.1.19, there is a new option: Settings -> Llama-index -> Custom metadata.

In this option you can define custom fields in document metadata, which will be included during file indexing and overwrite those generated by Llama index data loaders. Just create a new entry for extension ts with key file_type and value your_custom_file_type_here and re-index the files.