microsoft / sample-app-aoai-chatGPT

Sample code for a simple web chat experience through Azure OpenAI, including Azure OpenAI On Your Data.
MIT License
1.44k stars 2.15k forks source link

Azure AI search tool message citations chunk_id always 0 #706

Open wyttime04 opened 3 months ago

wyttime04 commented 3 months ago

Describe the bug I'm using data_preparation.py to upload my data to AI search index. Then I chat with AI and it response fine but then citations chunk_id is always "0". How can I specify this citations chunk_id ?

To Reproduce Steps to reproduce the behavior:

  1. Prepare data with scripts/readme.md

  2. Add chunk_id field in data_preparation.py when create search index

  3. Add chunk_id field when chunk file

  4. Run data_preparation.py with config.json

  5. AI search result can see the chunk_id field image

  6. Start App with this index

  7. AI response message citations chunk_id always "0" image

Expected behavior chunk_id should response AI search fields value

Screenshots If applicable, add screenshots to help explain your problem.

Configuration: Please provide the following

Additional context Add any other context about the problem here.

taigrr commented 2 months ago

Hey, I ran into this using the golang SDK. I'm sure you've moved on for now, but the solution for me was to use the FieldMappings option to map the chunkID to the Filename property. Then I can get the filename property back (which is actually the chunk_id) and perform another query against Cognitive Search using a filter. It's definitely less than ideal, but shouldn't break even after this is fixed.

wyttime04 commented 2 months ago

Hi @taigrr , the method I use is the same as yours. I mark the chunk ID in the filepath like path/to/file - Part 1 and the total chunk number in the metadata field, which is useful when I want to delete all chunks of one file. I think this approach is acceptable for now. Btw, I found something strange, when setting the chunk size larger, the API response chunk_id may be greater than 0. Does this mean the chat completion API might be automatically chunked? 🤔