langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.26k stars 6.53k forks source link

Dify returning information from document that is currently not in the knowledge base but was present earlier #3022

Closed ayushiKanjiya closed 4 months ago

ayushiKanjiya commented 5 months ago

Self Checks

Dify version

0.5.10

Cloud or Self Hosted

Cloud

Steps to reproduce

I created knowledge and attached it to a text-generated app. Below is the prompt in the application

I uploaded Document1 to knowledge and executed some queries to get information from Document1. After getting information, I deleted Document1 from knowledge.

I then uploaded Document2 to knowledge and executed some queries to get information from Document2. After getting information, I deleted Document2 from knowledge.

Document 1: Ayushi@CV.docx.pdf Document 2: JT_Resume.pdf

===================================================================================

Application Prompt:

If you can't find the answer, then please return null. Please give answer in array. Like: {"personal_info": {"name" : "", "location" : "", "contact" : "", "email" : "", "total_experience_year" : ""}, "technical_skill": ["",""], "soft_skill": ["",""], "social_link": ["",""], "education": [{"degree" : "", "university": "", "CGPA": "", "year" : ""}], "experience": [{"company": "", "location": "", "from": "", "to": "", "role": "", "project": [{"project_name" : "", "project_url" : "", "project_description" : ""}]]} If I have multiple education or experience, then please return into array. If you don't find any value then please return null. In below Experience, there should be mention current company and past company name and location also. please generate whole answer in above format only object.

✔️ Expected Behavior

I need to get the actual data that is mentioned in the Document2, whose document name is mentioned in the query.

Document 2: JT_Resume.pdf Query: Give an answer in words, not sentences. Please give all the information mentioned in Prefix Prompt (personal_info, technical_skill, soft_skill, social_link, education, experience) from JT_Resume.pdf.

Output: {"personal_info": {"name" : "Jinal Thakkar", "location" : null, "contact" : "9930932770", "email" : "jkap@techknowledge.io", "total_experience_year" : null}, "technical_skill": ["Python", "Django", "React", "MySQL", "Firebase", "MongoDB", "AWS", "Linux", "Golang", "Flask", "PyQt5", "Windows Server 2000", "RESTful APIs", "Multithreading", "Multiprocessing", "DSA", "OOPS"], "soft_skill": null, "social_link": ["rahul-singh98", "rahulsingh", "Leetcode"], "education": [{"degree" : "Masters of Computer Applications", "university": "Indira Gandhi National Open University", "CGPA": null, "year" : "Sep 2024"}, {"degree" : "Bachelor of Computer Applications", "university": "Indira Gandhi National Open University", "CGPA": null, "year" : "Sep 2022"}], "experience": [{"company": "Vide Alpha Tech", "location": null, "from": "Mar 2023", "to": "Present", "role": "Senior Python Developer", "project": [{"project_name" : "Real-Time Trading Application for Forex Market", "project_url" : null, "project_description" : "Leading the design and development using Golang and React. Implementing software applications and building RESTful APIs for seamless integration of different software systems."}]}, {"company": "Codesure Software Solutions", "location": null, "from": "Mar 2021", "to": "Feb 2023", "role": "Python Developer", "project": [{"project_name" : null, "project_url" : null, "project_description" : "Developed and implemented software applications using Golang. Proficient in writing automated unit tests and building RESTful APIs for seamless software integration."}]}]}

❌ Actual Behavior

It returns the data from Document1 instead of Document2.

Document 2: JT_Resume.pdf Query: Give an answer in words, not sentences. Please give all the information mentioned in the Prefix Prompt (personal_info, technical_skill, soft_skill, social_link, education, experience) from JT_Resume.pdf.

Output: {"personal_info": {"name" : "AYUSHI KANJIYA", "location" : "Jamnagar, India, 361140", "contact" : "-", "email" : "ayushidetroja28@gmail.com", "total_experience_year" : "3 years"}, "technical_skill": ["C", "C++", "Java", "JavaScript", "TypeScript", "jQuery", "BootStrap", "NoSQL", "SQLite", "HTML", "CSS", "Tailwind", "SASS", "Firebase", "Android", "React.js", "Next.js", "Vue.js", "Debugging", "Git"], "soft_skill": ["Communication skill"], "social_link": null, "education": [{"degree" : "B.Tech (Information Technology)", "university": "Uka Tarsadiya University", "CGPA": "7.19", "year" : "06/2019"}, {"degree" : "Higher Secondary", "university": "I. P. Savani", "GPA": "76", "year" : "05/2014"}], "experience": [{"company": "Copieux Techknowledge Analytics (WFH)", "location": "Mumbai, India", "from": "06/2023", "to": "Current", "role": "Software Engineer II", "project": null}, {"company": "Micrasol LLP", "location": "Surat, India", "from": "09/2020", "to": "05/2023", "role": "Senior Web Developer", "project": null}, {"company": "Sparkle Infotech", "location": "Surat, India", "from": "07/2019", "to": "08/2020", "role": "Android Developer", "project": null}]}

dosubot[bot] commented 5 months ago

Hey @ayushiKanjiya, great to see you back here! 🚀

Thank you for providing detailed information about the issue you're experiencing. Based on the context provided, it seems like Dify is not correctly handling the deletion of documents from the knowledge base and is returning data from a previously deleted document.

From the information available in the repository, Dify uses two modes for data retrieval: N-to-1 Retrieval and Multi-path Retrieval. In both modes, the system selects the most relevant knowledge base or bases for querying based on user intent and knowledge description. However, the specific order or priority in which documents are accessed depends on the retrieval mode selected. You can find more details about this in the retrieval.md document.

Unfortunately, there's no information available in the repository about how Dify handles the deletion of documents from the knowledge base or whether there's a caching mechanism that could potentially retain data from deleted documents. This could be a potential reason why you're seeing data from a previously deleted document.

As a maintainer, I don't have the ability to create issues or pull requests. However, I would recommend keeping track of this issue and checking for updates in the repository. It's also possible that other users have experienced similar issues, so you might find additional information in the issues section of the repository.

If you have any additional information or context that could help in understanding this issue better, please feel free to share.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
crazywoola commented 5 months ago

If you made some changes in the datasets. You need to restart the conversation and try it again. The new configuration only works for new conversation. The legacy knowledge may exists in the memory of the llm.

ayushiKanjiya commented 5 months ago

We are using text-generated app, not assistant app. In Text-generated app, there is no option to restart the conversation.

dosubot[bot] commented 4 months ago

Hi, @ayushiKanjiya

I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue with Dify where it returns data from a previously deleted document instead of the current one. There were discussions about potential causes such as how Dify handles document deletion and the possibility of a caching mechanism retaining data. However, it seems that the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!