webis-de / set-encoder

13 stars 0 forks source link

Tool learning for LLM #4

Open QuangTQV opened 4 months ago

QuangTQV commented 4 months ago

I am currently working on a problem to rerank tools (retrieving the appropriate tool for LLM), but the cross-encoder models are not converging. Here is an example: query: give me btc price tool: get token price Is your model feasible for this task?

fschlatt commented 4 months ago

Could you provide a few extra details?

  1. What exactly is a tool?
  2. When you say a cross-encoder is not converging, do you mean you are fine-tuning a cross-encoder on your dataset and it's not learning correctly?
QuangTQV commented 4 months ago

Could you provide a few extra details?

  1. What exactly is a tool?
  2. When you say a cross-encoder is not converging, do you mean you are fine-tuning a cross-encoder on your dataset and it's not learning correctly?

Current LLMs are being directed towards building agent systems. An agent is a system based on LLMs by providing tool descriptions in prompts for the LLM For example: prompt = """You are my assistant. You are allowed to use the following tools to complete tasks: Tool 1: name: Play Music, description: Used to play a song based on its name. Tool 2: name: Summarize News, description: Summarizes today's news. ... Tool n""" When the number n increases significantly, including all tool descriptions in the prompt for the LLM leads to context explosion, reduces accuracy in tool invocation, and incurs substantial costs. Therefore, it is necessary to filter out redundant tools before inputting them into the LLM.

My approach filters tools in two steps:

Step 1: Use a bi-encoder (quite effective). Step 2: Use a cross-encoder to re-rank (very poor performance, even after fine-tuning). The data used for fine-tuning is structured as follows:

"query": The query text. "pos": A list of useful tool descriptions that can help solve the query. "neg": A list of tool descriptions that are not needed.

fschlatt commented 3 months ago

Thanks for the additional context. If I understand correctly, you want to rank tool descriptions based on a query specifying the need for a tool.

I'm surprised that a bi-encoder is more effective than a cross-encoder on this task and would assume that given enough high-quality training data, a cross-encoder will be substantially more effective.

That being said, the Set-Encoder most likely will not give you a substantial boost over a standard cross-encoder's effectiveness. The Set-Encoder excels when interactions between the items to be ranked are necessary. In this case, the tools can most likely be ranked independently from one another.

QuangTQV commented 3 months ago

Thanks for the additional context. If I understand correctly, you want to rank tool descriptions based on a query specifying the need for a tool.

I'm surprised that a bi-encoder is more effective than a cross-encoder on this task and would assume that given enough high-quality training data, a cross-encoder will be substantially more effective.

That being said, the Set-Encoder most likely will not give you a substantial boost over a standard cross-encoder's effectiveness. The Set-Encoder excels when interactions between the items to be ranked are necessary. In this case, the tools can most likely be ranked independently from one another.

I think set encoder can still be used, when a query needs the cooperation of many tools to complete.

fschlatt commented 3 months ago

Yes, that is a good point. In those cases, the Set-Encoder is likely to be more effective than a standard cross-encoder.