Citations - Githubissues

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

https://dify.ai

Other

45.54k stars 6.41k forks source link

Citations #7970

Open Sf-teacherspace opened 1 week ago

Sf-teacherspace commented 1 week ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

Version 0.7.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Skærmbillede 2024-09-04 kl 09 15 16

✔️ Expected Behavior

I’m working on creating a chatbot in your program. I’ve built a knowledge base that the chatbot draws information from. Previously, it cited the sources it took from the knowledge base, but it no longer does that. Is this something I’ve done? I’ve been experimenting a bit with the prompt, but could that be where the issue lies? I would like the chatbot to indicate where it gets its information from. It should primarily take it from the knowledge base. What has gone wrong?

❌ Actual Behavior

It has stopped making citations from the knowledge base

dosubot[bot] commented 1 week ago

Hello, @Sf-teacherspace! I'm here to help you with bugs, questions, and becoming a contributor. Let's squash those bugs together! 🤖

I found a similar discussion that might be relevant to your issue: How to use Citation and Attribution in chatflow？ ^[1]. This discussion is still open and might provide additional insights.

Regarding your issue, the chatbot has likely stopped making citations from the knowledge base because the citation attribution feature might not be enabled. To resolve this, you need to enable the citation attribution feature by going to Workspace -- Add Function -- Citation and Attribution.

Here are the steps to enable this feature:

Navigate to Workspace.
Select Add Function.
Enable Citation and Attribution.

For detailed instructions, you can refer to the Citation and Attribution guide ^[2].

After enabling this feature, the large language model will be able to cite content from the knowledge base, and you can view specific citation paragraph information below the response content.

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

@dosu Well I have enable the function - but it stille doesn't work - is it something to do with my prompt? It seems like it takes citations from random sources and not my knowledge base. Maybe it my settings? Skærmbillede 2024-09-04 kl 10 39 03

dosubot[bot] commented 1 week ago

The issue with the chatbot not citing sources from your knowledge base, even after enabling the citation attribution feature, could indeed be related to your prompt configuration or retrieval settings.

Steps to Resolve the Issue:

Verify Retrieval Settings:
- Ensure that the retrieval method and parameters (TopK value and recall threshold) are appropriately configured. These settings significantly impact the recall of relevant segments from the knowledge base.
- Adjusting these parameters can improve the quality and relevance of the retrieved segments.
Test Retrieval:
- Use the retrieval testing feature to debug and confirm that the knowledge base is correctly recalling relevant segments for common user questions. This can help identify if the retrieval method or parameters need further adjustment.
Check Prompt Configuration:
- Ensure that your prompt is designed to encourage the chatbot to cite sources. The prompt should explicitly instruct the chatbot to include citations from the knowledge base in its responses.
- Make sure that the context component is not missing, as this can affect the effectiveness of the prompt.

Example Prompt Structure:

SYSTEM

Use the following context as your learned knowledge, inside <context></context> XML tags.

<context>
{{#context#}}
</context>

When answering the user:
- If you don't know, just say that you don't know.
- If you don't know when you are not sure, ask for clarification.
Avoid mentioning that you obtained the information from the context.
And answer according to the language of the user's question.
{{pre_prompt}}

USER

{{Query}} // Input query variable here

ASSISTANT

""

Additional Debugging Tools:

Logs Feature: Review interaction logs to identify where the attribution may have gone wrong.
Citation Attribution Debugging: Enable this feature to view specific citation details below the response, including the original text segment, segment number, and match score. This can help you debug and edit the corresponding segment in the knowledge base.

By ensuring that both the retrieval settings and prompt configuration are correctly set up, you can improve the likelihood of the chatbot citing sources from your knowledge base ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

@dosu How could the settings look like: Verify Retrieval Settings:

Ensure that the retrieval method and parameters (TopK value and recall threshold) are appropriately configured. These settings significantly impact the recall of relevant segments from the knowledge base. Adjusting these parameters can improve the quality and relevance of the retrieved segments.

dosubot[bot] commented 1 week ago

To verify and adjust the retrieval settings, you can follow these steps:

Retrieval Settings Configuration

Retrieval Method:
- Hybrid Search: Combines full-text search and vector search, re-ranking the results to select the best match for the user's query. This method leverages the strengths of both approaches.
TopK Value:
- Default Value: 3
- Adjustment: Increase or decrease based on the context window size of the selected model. A higher TopK value retrieves more segments, improving recall but potentially introducing noise. A lower TopK value retrieves fewer segments, which may result in incomplete recall.
Score Threshold:
- Default Setting: Off
- Default Value When Enabled: 0.5
- Adjustment: Increase the threshold to filter out less relevant segments, improving the quality of the retrieved results. A higher threshold yields fewer but more relevant segments, while a lower threshold recalls more segments, potentially including less relevant ones.
Rerank Model:
- Enable Rerank Model: After configuring the API key for the Rerank model on the "Model Provider" page, enable the rerank model to perform semantic reordering of the retrieved document results. This further refines the quality and relevance of the retrieved segments.

Example Configuration

Vector Search Settings

TopK: 5
Score Threshold: 0.6
Rerank Model: Enabled

Full-Text Search Settings

TopK: 5
Score Threshold: 0.6
Rerank Model: Enabled

Hybrid Search Settings

Weight Settings: Custom (e.g., Semantic: 0.7, Keyword: 0.3)
TopK: 5
Score Threshold: 0.6
Rerank Model: Enabled

Steps to Adjust Settings

Navigate to Retrieval Settings:
- Go to Knowledge Base Settings > Retrieval Settings.
Select Retrieval Method:
- Choose between Vector Search, Full-Text Search, or Hybrid Search.
Configure Parameters:
- Set the TopK value and Score Threshold according to your needs.
- Enable and configure the Rerank Model if necessary.
Test Retrieval:
- Use the retrieval testing feature to debug and confirm that the knowledge base is correctly recalling relevant segments for common user questions.

By configuring these settings, you can enhance the retrieval process to ensure that the most relevant and high-quality segments are retrieved from the knowledge base ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

Skærmbillede 2024-09-04 kl 11 20 55 It still doesn't apply citations :/

Sf-teacherspace commented 1 week ago

Skærmbillede 2024-09-04 kl 11 22 50 Nothing happens in retrieval testing

Sf-teacherspace commented 1 week ago

@dosu

dosubot[bot] commented 1 week ago

To resolve the issue with citation attribution and retrieval testing, follow these steps:

Verify Retrieval Settings

Hybrid Search Configuration:
- Weight Settings: Adjust the weights for semantic and keyword components to balance the retrieval process.
  - Example: Semantic: 0.7, Keyword: 0.3
- TopK Value: Set to 5 to retrieve more segments.
- Score Threshold: Enable and set to 0.6 to filter out less relevant segments.
- Rerank Model: Ensure it is enabled to refine the quality of retrieved segments.

Steps to Adjust Settings

Navigate to Retrieval Settings:
- Go to Knowledge Base Settings > Retrieval Settings.
Select Hybrid Search:
- Choose Hybrid Search and adjust the weight settings, TopK value, and score threshold.
Enable Rerank Model:
- Ensure the rerank model is enabled and configured correctly.

Test Retrieval

Enter Common User Questions:
- Input common user questions in the Source Text input box.
Test Retrieval:
- Click Test to view the recall results in the Recalled Paragraph section.

Enable Citation Attribution

Go to Workspace:
- Navigate to Workspace -- Add Function -- Citation and Attribution to enable the citation attribution feature.
View Citation Information:
- After enabling, view specific citation paragraph information below the response content, including original text, segment number, and match score.

Debugging Steps

Check Logs:
- Review interaction logs to identify where the attribution may have gone wrong.
Adjust Prompt Configuration:
- Ensure the prompt instructs the chatbot to include citations from the knowledge base.

By following these steps, you should be able to resolve the issues with citation attribution and retrieval testing ^[1]^[2].

This is a tough nut to crack, could really use your help on this, @takatost.

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

@takatost please view the historic and help me solve the issue

Sf-teacherspace commented 1 week ago

@dosu I need more help! the testing doesnt work

dosubot[bot] commented 1 week ago

To resolve the issues with retrieval testing and citation attribution, let's go through a detailed checklist to ensure everything is configured correctly.

Verify Retrieval Settings

Hybrid Search Configuration:
- Weight Settings: Adjust the weights for semantic and keyword components.
  - Example: Semantic: 0.7, Keyword: 0.3
- TopK Value: Set to 5 to retrieve more segments.
- Score Threshold: Enable and set to 0.6 to filter out less relevant segments.
- Rerank Model: Ensure it is enabled to refine the quality of retrieved segments.

Steps to Adjust Settings

Navigate to Retrieval Settings:
- Go to Knowledge Base Settings > Retrieval Settings.
Select Hybrid Search:
- Choose Hybrid Search and adjust the weight settings, TopK value, and score threshold.
Enable Rerank Model:
- Ensure the rerank model is enabled and configured correctly.

Test Retrieval

Enter Common User Questions:
- Input common user questions in the Source Text input box.
Test Retrieval:
- Click Test to view the recall results in the Recalled Paragraph section.

Enable Citation Attribution

Go to Workspace:
- Navigate to Workspace -- Add Function -- Citation and Attribution to enable the citation attribution feature.
View Citation Information:
- After enabling, view specific citation paragraph information below the response content, including original text, segment number, and match score.

Debugging Steps

Check Logs:
- Review interaction logs to identify where the attribution may have gone wrong.
Adjust Prompt Configuration:
- Ensure the prompt instructs the chatbot to include citations from the knowledge base.

Common Issues and Troubleshooting

Incomplete Recall of Relevant Texts:
- Solution: Increase the TopK value to recall more segments.
Low Quality of Recalled Segments:
- Solution: Adjust the TopK value and the recall score threshold. Increase the threshold to improve the quality of recalled segments.
Missing Relevant Segments:
- Solution: Decrease the recall score threshold to include more segments.
Parameter Configuration Changes Not Taking Effect:
- Solution: Ensure changes are made in the retrieval settings during the recall testing process. Confirm changes in Knowledge Base Settings > Retrieval Settings.

By following these steps and ensuring the settings are correctly configured, you should be able to resolve the issues with retrieval testing and citation attribution ^[1]^[2].

Additionally, a maintainer has already been looped in to assist further.

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

It seems like it doesnt take knowledge from the base, put outside the base. What car I do to ensure it takes from the base?

Sf-teacherspace commented 1 week ago

@dosu

dosubot[bot] commented 1 week ago

To ensure that the chatbot retrieves information exclusively from the knowledge base and not from outside sources, follow these steps:

Steps to Ensure Retrieval from Knowledge Base

Create and Upload Knowledge Base:
- Go to Knowledge -- Create Knowledge -- Upload file to add your knowledge base.
Create Chatbot Application:
- Go to Studio -- Create Application -- Select Chatbot.
Associate Knowledge Base with Chatbot:
- Enter Context, click Add, and select the knowledge base you created.
Configure Retrieval Settings:
- In Context Settings -- Retrieval Setting, configure the Retrieval Setting to ensure the chatbot retrieves information only from the specified knowledge base.
- Choose Hybrid Search for a balanced approach, combining full-text and vector search.
- Set TopK Value to 5 to retrieve more segments.
- Enable Score Threshold and set it to 0.6 to filter out less relevant segments.
- Enable the Rerank Model to refine the quality of retrieved segments.
Enable Citation and Attribution:
- Enable Citation and Attribution in Workspace -- Add Function -- Citation and Attribution to ensure the chatbot references the knowledge base.
Debug and Publish:
- In Debug and Preview, input user questions related to the knowledge base for debugging.
- After debugging, click the Publish button to make the AI application based on your knowledge base.

Example Configuration

Hybrid Search Settings

Weight Settings: Semantic: 0.7, Keyword: 0.3
TopK: 5
Score Threshold: 0.6
Rerank Model: Enabled

Additional Debugging Steps

Check Logs:
- Review interaction logs to identify where the attribution may have gone wrong.
Adjust Prompt Configuration:
- Ensure the prompt instructs the chatbot to include citations from the knowledge base.

By following these steps and ensuring the settings are correctly configured, you can ensure that the chatbot retrieves information exclusively from the knowledge base ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

Skærmbillede 2024-09-04 kl 14 08 40 It says 0 tokens at knowledge - thats the problem??

dosubot[bot] commented 1 week ago

The issue with the "Indhent viden" (Retrieve Knowledge) step showing 0 tokens indicates that the retrieval process is not successfully fetching any relevant segments from the knowledge base. Here are some steps to troubleshoot and resolve this issue:

Steps to Resolve the 0 Tokens Issue

Verify Knowledge Base Association:
- Ensure that the knowledge base is correctly associated with the chatbot application.
- Go to Studio -- Create Application -- Select Chatbot and ensure the knowledge base is added in the Context section.
Check Retrieval Settings:
- Navigate to Knowledge Base Settings -- Retrieval Settings.
- Ensure that the retrieval method is set to Hybrid Search for a balanced approach.
- Adjust the TopK Value to 5 to retrieve more segments.
- Enable the Score Threshold and set it to 0.6 to filter out less relevant segments.
- Ensure the Rerank Model is enabled to refine the quality of retrieved segments.
Test Retrieval:
- Use the Retrieval Testing feature to input common user questions and verify that relevant segments are being recalled.
- If no segments are recalled, adjust the TopK and Score Threshold settings.
Enable Citation and Attribution:
- Go to Workspace -- Add Function -- Citation and Attribution to enable the citation attribution feature.
- Ensure that the chatbot is configured to include citations from the knowledge base in its responses.
Check Logs and Debug:
- Review the interaction logs to identify where the retrieval process may be failing.
- Ensure that the prompt instructs the chatbot to include citations from the knowledge base.

Example Configuration

Hybrid Search Settings

Weight Settings: Semantic: 0.7, Keyword: 0.3
TopK: 5
Score Threshold: 0.6
Rerank Model: Enabled

Additional Debugging Steps

Check Knowledge Base Content:
- Ensure that the knowledge base contains relevant and well-structured content that can be retrieved.
Adjust Prompt Configuration:
- Ensure the prompt is designed to encourage the chatbot to cite sources from the knowledge base.

By following these steps and ensuring the settings are correctly configured, you should be able to resolve the issue with the retrieval process showing 0 tokens and ensure that the chatbot retrieves information exclusively from the knowledge base ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

Sf-teacherspace commented 1 week ago

Skærmbillede 2024-09-04 kl 14 21 47 I did - but it still says the same

crazywoola commented 1 week ago

Hello, can you make a minimal reproducable DSL, so we can take a look at.

ArnaudGardille commented 1 week ago

Hi! Same issue as original poster with basic chatbot and agent. Still works well with Chatflow.