Open chandlj opened 3 weeks ago
Hey @chandlj, we're working on "adding calls to a dataset", which I think is what you're asking for.
Basically:
# 1. Create Call objects containing the inputs, outputs, etc.
calls = []
for x in range(3):
res, call = await model.predict.call(...)
calls.append(call)
# 2. Generate a dataset from those calls (your pre-computed model outputs)
dataset = Dataset.from_calls(calls)
# 3. Pass to Evaluation as you would normally.
evaluation = Evaluation(dataset=dataset, ...)
Then you can reuse the dataset later using the "Use" tab in the UI
Hey @andrewtruong thanks for the swift reply! When can we expect this feature to be completed?
No firm timeline atm, but my current guess would be in the next few weeks!
Would you want to primarily add calls via the API (like above), or via the UI?
We would probably like to do it via the API. I kind of envision it where we would have a dataset of inputs, so like:
dataset = Dataset(name="papers", rows=[{"id": 0, "context": ...}, ...])
calls = []
for entry in dataset:
res, call = await model.predict(entry)
calls.append(call)
dataset_with_responses = Dataset.from_calls(calls)
dataset_with_responses.publish(name="papers_with_calls")
...
# Later, using the dataset
dataset = weave.ref("papers_with_calls")
evaluation = Evaluation(dataset=dataset, scorers=[... dynamically changing list of scorers ...])
evaluation.evaluate() # In theory, you would not need to pass the model here because we have already computed outputs
It would be nice if we could pre-compute a model's output on a particular dataset, and essentially "cache" this for use in an evaluation. For example, we have a large dataset of long-context documents and running our model on this dataset is particularly expensive. If we would like to change our evaluation pipeline at any point, either adding/removing/modifying a computed metric/score on the dataset, then it seems to me that we would have to re-run our model on the entire dataset to get a new evaluation run.
It does seem like you would be able to do this:
However, this is obviously not ideal and would probably be confusing in the UI.