Closed dc-vin closed 2 weeks ago
I found a similar closed issue regarding the Q&A segmentation mode, where segmentation using the option "simplified Chinese" generated a large number of English segments. The suggested solution was to emphasize the use of language above the prompt and to check out the latest code [1].
Additionally, there is an existing API endpoint for setting the QA mode when creating a document. You can set the process_rule
parameter in the request body to include a mode
field that can be set to automatic
or custom
. Here are examples for creating a document from text and from a file:
For text:
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "text",
"text": "text",
"indexing_technique": "high_quality",
"process_rule": {
"mode": "automatic"
}
}'
For a file:
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{\"name\":\"Dify\",\"indexing_technique\":\"high_quality\",\"process_rule\":{\"rules\":{\"pre_processing_rules\":[{\"id\":\"remove_extra_spaces\",\"enabled\":true},{\"id\":\"remove_urls_emails\",\"enabled\":true}],\"segmentation\":{\"separator\":\"###\",\"max_tokens\":500}},\"mode\":\"custom\"}}";type=text/plain' \
--form 'file=@"/path/to/file"'
To continue talking to Dosu, mention @dosu.
By the way, have you considered supporting the setting of QA mode when creating a document through an API?
I think this is a good idea, maybe you can open a pr to fulfill this feature, because we are a little bit busy these days.
This 'erased' behavior is expected, because we will let LLM to split original texts into Q&A pairs.
This 'erased' behavior is expected, because we will let LLM to split original texts into Q&A pairs.
A colleague modified the QA mode a couple of days ago, which resulted in the deletion of long-term maintained data without any backup. This led to criticism from our supervisor. I suggest notifying users that their data will be erased when modifying the mode.
By the way, have you considered supporting the setting of QA mode when creating a document through an API?
I think this is a good idea, maybe you can open a pr to fulfill this feature, because we are a little bit busy these days.
If I have time this month, I will try to open a PR.
This 'erased' behavior is expected, because we will let LLM to split original texts into Q&A pairs.
I discovered a new bug. when I set the QA mode and change the language type, it automatically reverts to English after saving.
Self Checks
Dify version
0.7.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Modifying document segmenting in Question & Answer format leads to all data being erased By the way, have you considered supporting the setting of QA mode when creating a document through an API?
✔️ Expected Behavior
No response
❌ Actual Behavior
No response