tryolabs / restricttotopic

Validator for GuardrailsHub to check if a text is related with a topic.
Apache License 2.0
1 stars 5 forks source link

Enabling faster latency for validator execution #8

Closed ShreyaR closed 5 months ago

wylansford commented 5 months ago

This update makes some major latency improvements through a few different upgrades.

  1. Improves overall throughput by converting the model to batch mode on a per topic basis. Instead of multiple calls for each topic, all categories are check in one pass.

  2. The LLM call is now defaulting to gpt4o.

  3. The LLM call has a shorter prompt that just returns a json list, improving latency significantly. note: We tried using function calling to improve the reliability of the llm, but it is much slower. Using json mode + telling the model to use json is faster than relying on function calling.

  4. Overall code cleanup for readability and maintainability

Most recent tests show ~700ms for the llm call, and around 1.5s total for inference on my m2 based mac on cpu. With a gpu it should be faster.

disabling the llm and with a gpu, latency can be as low as 300ms for a single validation call