Proposal for helmer: Build LLM Chat Pipelines from a Data Frame or Tibble

Hi – I’m excited to build upon this package. I created this issue is to state my intentions to develop helmer and gather community feedback. I can move it to the Posit forum if that makes more sense.

helmer: Build LLM Chat Pipelines from a Data Frame or Tibble

The goal of helmer is to extend the elmer package to build llm batch chat pipelines with support for asynchronous batching when supported by the api. Major design goals include supporting parallel processing for longer responses and ensuring efficient responses in multi-model pipelines during batch processing. It will support multiple formats input formats such as vectors, data frames or tibbles, and lists. It will also track token consumption, add delays for rate-limiting, implement metadata storage, and even include similarity scoring, allowing users to systematically generate, compare, and analyze chat responses across models.

Development goals:

Create a new package with a fresh codebase (depreciate batchLLM - this was my hacky first attempt before discovering elmer and tidyllm)
Maintain consistency with tidyverse style of programming
Leverage key features of elmer (e.g., structured data and R function tooling)
Learn from the asynchronous batching functions used in tidyllm

Here's a simple shiny app for testing sequential vs. parallel processing and similarity scoring: https://gist.github.com/dylanpieper/a02cc4b009baa47fc0e0d7350197114f

tidyverse / elmer

Proposal for helmer: Build LLM Chat Pipelines from a Data Frame or Tibble #143