outlines-dev / outlines

Structured Text Generation
https://outlines-dev.github.io/outlines/
Apache License 2.0
6.94k stars 357 forks source link

Fix samplers documentation #980

Closed jrinder42 closed 1 week ago

jrinder42 commented 2 weeks ago

This fixes issue #978. Specifically for the classification example in the outlines cookbook section here, it fixes the piece of code in outlines.generate.api.SequenceGeneratorAdapter that keeps returning the wrong result (i.e. returning False even if a specific sampler is passed (greedy, multinomial, or beam search).

lapp0 commented 2 weeks ago

Thanks for identifying and working towards fixing this issue!

Could we fix the docs instead of changing the code though? Some lines in the documentation and tests rely on samplers being callable, e.g.

outlines/docs/reference/samplers.md:sampler = samplers.beam_search(beams=5)
tests/generate/test_integration_llamacpp.py:    sampler = samplers.beam_search(1)
tests/generate/test_integration_transformers.py:    sampler = beam_search(5)
tests/test_samplers.py:    beam_search,
tests/test_samplers.py:    assert beam_search == BeamSearchSampler
jrinder42 commented 2 weeks ago

I understand that there are tests written regarding this, but something doesn't seem to be working correctly. The first line of the classification docs is as follows:

from outlines.samplers import greedy

This is where the issue begins. A potential solution would be to do the following in the docs:

from outlines.samplers import BeamSearchSampler, GreedySampler, MultinomialSampler

generator = outlines.generate.choices(model, ["URGENT", "STANDARD"], sampler=GreedySampler())

However, then that begs the question of whether greedy, multinomial, and beam_search from outlines.samplers would be useless (I honestly have not looked to see if these are used in the rest of the codebase). I still think that some code should be changed, I suppose the question is what code / behavior should be changed...

jrinder42 commented 2 weeks ago

And one thing about the about a test like:

tests/test_samplers.py:    assert beam_search == BeamSearchSampler

In this context, == and isinstance behave differently. There is an example demonstrating this in the description for issue #978. So the test above is technically not checking for the correct thing (unless I am misunderstanding something).

lapp0 commented 1 week ago

I understand that there are tests written regarding this, but something doesn't seem to be working correctly. The first line of the classification docs is as follows:

from outlines.samplers import greedy

This is where the issue begins. A potential solution would be to do the following in the docs:

from outlines.samplers import BeamSearchSampler, GreedySampler, MultinomialSampler

generator = outlines.generate.choices(model, ["URGENT", "STANDARD"], sampler=GreedySampler())

However, then that begs the question of whether greedy, multinomial, and beam_search from outlines.samplers would be useless (I honestly have not looked to see if these are used in the rest of the codebase). I still think that some code should be changed, I suppose the question is what code / behavior should be changed...

samplers.greedy is a helper. It's equivalent to samplers.GreedySampler, but less verbose. I think we only need a docs change here.

tests/test_samplers.py: assert beam_search == BeamSearchSampler

In this context, == and isinstance behave differently. There is an example demonstrating this in the description for issue https://github.com/outlines-dev/outlines/issues/978. So the test above is technically not checking for the correct thing (unless I am misunderstanding something).

The test is correct, it's asserting that beam_search is a convenience variable equivalent to the class itself.

jrinder42 commented 1 week ago

Ok, sounds good. I will make the necessary changes.

jrinder42 commented 1 week ago

It should be good to go

rlouf commented 1 week ago

Thank you!