langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://smith.langchain.com/
MIT License
346 stars 59 forks source link

Issue: Typescript evaluate takes much longer than runOnDataset when running evaluations #790

Closed andrewmatchday closed 2 weeks ago

andrewmatchday commented 2 weeks ago

I have a dataset with 71 examples where I ran the same evaluations on using evaluate vs runOnDataset. The time it took for each was: runOnDataset - 18 seconds evaluate - 3 minutes 40 seconds

From what I could tell from the documentation, runOnDataset is the older version and we should be using evaluate as it allows for experiment prefix and shows model, provider, revision id, etc. Is this the reason for why the await for the response of evaluate takes 10x longer? Is there some option I'm missing to run more evals concurrently?

Suggestion:

No response

hinthornw commented 2 weeks ago

Here are the options you can configure for TS evalution:

export interface EvaluateOptions {
  /**
   * The dataset to evaluate on. Can be a dataset name, a list of
   * examples, or a generator of examples.
   */
  data: DataT;
  /**
   * A list of evaluators to run on each example.
   * @default undefined
   */
  evaluators?: Array<EvaluatorT>;
  /**
   * A list of summary evaluators to run on the entire dataset.
   * @default undefined
   */
  summaryEvaluators?: Array<SummaryEvaluatorT>;
  /**
   * Metadata to attach to the experiment.
   * @default undefined
   */
  metadata?: KVMap;
  /**
   * A prefix to provide for your experiment name.
   * @default undefined
   */
  experimentPrefix?: string;
  /**
   * A free-form description of the experiment.
   */
  description?: string;
  /**
   * The maximum number of concurrent evaluations to run.
   * @default undefined
   */
  maxConcurrency?: number;
  /**
   * The LangSmith client to use.
   * @default undefined
   */
  client?: Client;
  /**
   * The number of repetitions to perform. Each example
   * will be run this many times.
   * @default 1
   */
  numRepetitions?: number;
}

Looks like the default is to not run them concurrently in TS

andrewmatchday commented 2 weeks ago

Thanks, I had assumed it was the same as runOnDataset as maxConcurrency was default undefined for that as well

hinthornw commented 2 weeks ago

Thanks for raising though. I'm not sure why the default behavior was chosen to be different...