Closed kazuyaseki closed 1 week ago
Hi @kazuyaseki - thanks for reporting this issue. Could you share some code to help me reproduce?
I am able to pull 300 examples here no problem:
from langsmith import Client
c = Client()
name = "abcd test"
d = c.create_dataset(dataset_name=name)
c.create_examples(inputs=[{"in": i} for i in range(300)], dataset_id=d.id)
print(f"Created dataset {d.id}")
all_examples = list(c.list_examples(dataset_name=name))
print(f"Loaded {len(all_examples)} examples")
# Loaded 300 examples
@hinthornw actually that was my misunderstanding, actual problem was in delete_example. below code does not delete all examples.
name = "abcd test"
d = client.create_dataset(dataset_name=name)
client.create_examples(inputs=[{"in": i} for i in range(300)], dataset_id=d.id)
print(f"Created dataset {d.id}")
all_examples = client.list_examples(dataset_name=name)
for example in all_examples:
client.delete_example(example_id=example.id)
100 examples remaining.
@kazuyaseki I believe this is happening because you are altering the resource while you are iterating over it.
the list endpoint auto-paginates for you and will re-fetch after each "
So the provided code does these actions:
#200
since you altered the underlying container while iterating over it.This means examples 100-200 were not deleted.
It's a little like doing
data = list(range(300))
batch_size = 100
for i in range(0, len(data), batch_size):
vals=. data[i:i+batch_size]
for val in vals:
data.remove(val)
print("data", data)
Would be better to first fetch all the example ids and then delete them:
# fetch all the indices to delete so we're not modifying the container while iterating
all_examples = list(client.list_examples(dataset_name=name))
for example in all_examples:
client.delete_example(example_id=example.id)
@hinthornw I apologize for the late reply and taking your time. thanks for clear explanation! that was my mistake and will close the issue
Feature request
one small suggestion, client.list_example method has limit of 100 which is not explicit. I usually expect API to return everything when not specifying limit, so how about either change not to have default limit or show the warning when there are more than 100 examples in the datasets?
Motivation
when I fetched examples from a dataset that have more than 100 examples, I could not realize why there were missing examples