There are a number of open-source frameworks that allow you to evaluate your Agent's performance on a set of tasks (e.g. Phoenix, ToolBench, LangFuse). Users may want to connect to one of these in order to do more thorough evaluation (beyond sandbox testing) of their Agent before deployment.
Description
There are a number of open-source frameworks that allow you to evaluate your Agent's performance on a set of tasks (e.g. Phoenix, ToolBench, LangFuse). Users may want to connect to one of these in order to do more thorough evaluation (beyond sandbox testing) of their Agent before deployment.