microsoft / chat-copilot

MIT License
2.02k stars 688 forks source link

Unable to retrieve DocumentMemory when using AzureCognitiveSearch #407

Closed milind045 closed 11 months ago

milind045 commented 1 year ago

Describe the bug The ISemanticMemoryClientExtensions.SearchMemoryAsync call does not return any records when filtered by chatid. The issue is either with Semantic-Mem

To Reproduce Steps to reproduce the behavior:

  1. In a chat session, upload a supported document pdf
  2. Ask questions based on the uploaded document

Expected behavior Return results based on the document

Actual behavior No Search Results are returned The issue seems to be the filter parameter.

Screenshots Cognitive Search Index Tags below image

Filter Param Values below image

Platform

Additional context Add any other context about the problem here.

crickman commented 1 year ago

@milind045 - The "MinRelevance" can affect your result. This is controlled with PromptsOptions via appsettings.json:

image

image

I'd be curious to know if you are able to see a response using a lower threshold...even zero.

milind045 commented 1 year ago

@crickman Setting DocumentMemoryMinRelevance to zero does seem to fetch documents from Cognitive Search index. It seems passing zero would mean minDistance calculated here will be 0.5f.

If my findings are correct, wonder why ACS is scoring the document so low even when I copy a line from the document as the query? Is there a direct way to query ACS semantically to ascertain score would be same had we not queried via semantic-memory package?

crickman commented 1 year ago

Thank you for following up with the additional data.

The relevancy threshold can certainly be somewhat context specific. (In some cases the "best result" can be high 80's or 90's...some cases 0.68). I personally find .80 to be a bit conservative, depending on the context...one could certainly envision a version of chat-copilot where this setting is a slider value.

Anyway, yes, you can query the ACS index directly in the azure portal. Note, the score you see in the portal isn't the raw cosine similiarity (...) but it generally tracks.

image

One of the reason you don't see a hit with exact text is that chat-copilot uses the "summary" (not the raw input) as the memory query. Sometimes being a bit more verbose can assist in raising the relevancy score (even though it is still summarized).

Note: I believe the prompt details show the summarized intent (if you are curious):

image

TaoChenOSU commented 11 months ago

Closing. Feel free to reopen if you have further questions.