langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://smith.langchain.com/
MIT License
346 stars 59 forks source link

change the behavior of limit of client.list_example method #760

Closed kazuyaseki closed 1 week ago

kazuyaseki commented 1 month ago

Feature request

one small suggestion, client.list_example method has limit of 100 which is not explicit. I usually expect API to return everything when not specifying limit, so how about either change not to have default limit or show the warning when there are more than 100 examples in the datasets?

“limit”: min(limit, 100) if limit is not None else 100,

Motivation

when I fetched examples from a dataset that have more than 100 examples, I could not realize why there were missing examples

hinthornw commented 4 weeks ago

Hi @kazuyaseki - thanks for reporting this issue. Could you share some code to help me reproduce?

I am able to pull 300 examples here no problem:

from langsmith import Client

c = Client()
name = "abcd test"
d = c.create_dataset(dataset_name=name)
c.create_examples(inputs=[{"in": i} for i in range(300)], dataset_id=d.id)
print(f"Created dataset {d.id}")
all_examples = list(c.list_examples(dataset_name=name))
print(f"Loaded {len(all_examples)} examples")
# Loaded 300 examples
kazuyaseki commented 3 weeks ago

@hinthornw actually that was my misunderstanding, actual problem was in delete_example. below code does not delete all examples.

    name = "abcd test"
    d = client.create_dataset(dataset_name=name)
    client.create_examples(inputs=[{"in": i} for i in range(300)], dataset_id=d.id)
    print(f"Created dataset {d.id}")
    all_examples = client.list_examples(dataset_name=name)

    for example in all_examples:
        client.delete_example(example_id=example.id)

100 examples remaining.

スクリーンショット 2024-06-08 23 25 19

hinthornw commented 3 weeks ago

@kazuyaseki I believe this is happening because you are altering the resource while you are iterating over it.

the list endpoint auto-paginates for you and will re-fetch after each "" examples.

So the provided code does these actions:

  1. Fetch 100 examples, maintaining the cursor in memory
  2. Delete 100 examples.
  3. Try to fetch examples 100-200. Problem is the cursor now points to what was originally example #200 since you altered the underlying container while iterating over it.
  4. Delete 100 examples
  5. Try to fetch examples 200-300. Nothing to return.

This means examples 100-200 were not deleted.

It's a little like doing

data = list(range(300))
batch_size = 100
for i in range(0, len(data), batch_size):
     vals=. data[i:i+batch_size]
     for val in vals:
         data.remove(val)
print("data", data)

Would be better to first fetch all the example ids and then delete them:

# fetch all the indices to delete so we're not modifying the container while iterating
all_examples = list(client.list_examples(dataset_name=name))

for example in all_examples:
    client.delete_example(example_id=example.id)
kazuyaseki commented 1 week ago

@hinthornw I apologize for the late reply and taking your time. thanks for clear explanation! that was my mistake and will close the issue