i-am-bee / bee-agent-framework

The framework for building scalable agentic applications.
https://i-am-bee.github.io/bee-agent-framework/
Apache License 2.0
977 stars 94 forks source link

tools(similarity): filter out low scoring chunks #144

Open matoushavlena opened 1 week ago

matoushavlena commented 1 week ago

The wikipedia tool might be returning chunks that have low similarity scores and therefore not relevant/useful.

Similarly to minPageNameSimilarity, we would like to introduce a threshold that would filter out chunks/documents from the similarity tool. The initial value could be 0.25, but some exploration might be needed to decide on the right threshold.

When no documents are returned, the underlying tools (Wikipedia in this case) should return a LLM friendly message, such as "No results were found. Try to reformat your query.". This message already exists for the Wikipedia tool when no relevant pages are returned. We need to keep it DRY.

pilartomas commented 1 week ago

The scoring is provider specific so the filter needs to reflect that by accepting a predicate. Meaning a single numeric value won't be sufficient.

The "No results were found." is indeed used by the wikipedia tool output but the runner check the output for emptiness and uses BeeToolNoResultsPrompt instead, so I wonder if it isn't solved already.J

@Tomas2D thoughts?