langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://docs.smith.langchain.com/
MIT License
421 stars 80 forks source link

rfc: @ls.pytest.mark.parametrize interface #1199

Open baskaryan opened 1 week ago

baskaryan commented 1 week ago

almost certainly not handling lazy eval correctly, but what do we think of interface?

@ls.pytest.mark.parametrize("Sample Dataset 3", (lambda x: x))
def test_parametrize(inputs, outputs, reference_outputs) -> list:
    assert inputs == outputs
    return [{"key": "foo", "value": "bar"}]

some example experiments here https://dev.smith.langchain.com/public/e7782ea0-3de5-4352-8cd4-7b2cdbb03e4c/d

hinthornw commented 1 week ago

Things I like about this:

  1. Can connect to dataset
  2. outputs are fairly localized/transpernt
  3. Trace seems sensical (has outputs by default)
  4. Parallelized!
  5. Think you can re-use the score helping function if you wanted

Things I don't looove about this relative to @unit

  1. Seems harder to check multi-step things
  2. The actual system is run "outside" the test function
  3. Currently seems to be 1 experiment per unit test? Maybe that is the right equivalence though not sure
  4. Pytest doesn't like if you return stuff from the test function