Add top-p and top-k sampling to GenerationUtils

yohann-benchetrit commented 1 year ago

Add the following token-decoding helpers to GenerationUtils.generate:
- top-p
- top-k
- remove_invalid_values
- temperature (see Issue 2138)
Add corresponding unit tests and docstrings.

github-advanced-security[bot] commented 1 year ago

You have successfully added a new CodeQL configuration .github/workflows/codeql.yml:build. As part of the setup process, we have scanned this repository and found 3 existing alerts. Please check the repository Security tab to see all alerts.

joecummings commented 1 year ago

High level, I notice you implemented the sampling as part of the greedy_search function. Technically, with sampling this name is incorrect. I see a couple of ways of going about this.

Rename the function to a more generic sample method and remove the do_sample parameter to avoid confusion. Then, if top_k or top_p was specified, a true sampling would be implemented. Otherwise, it would essentially be a greedy search. (in this case, as well, top_k should probably default to 1 instead of 0.) This would be the best way to reuse code IMO.
Follow HF and have separate methods for sampling and greedy. From an adoption perspective, this might be beneficial as users of torchtext tend to also have experience with HF; however, there would certainly be some duplicated code.

Thoughts? @yohann-benchetrit

joecummings commented 1 year ago

Will want to add temperature, but can do that in a follow-up PR. Tracking issue: #2138

Nayef211 commented 1 year ago

High level, I notice you implemented the sampling as part of the greedy_search function. Technically, with sampling this name is incorrect. I see a couple of ways of going about this.

Rename the function to a more generic sample method and remove the do_sample parameter to avoid confusion. Then, if top_k or top_p was specified, a true sampling would be implemented. Otherwise, it would essentially be a greedy search. (in this case, as well, top_k should probably default to 1 instead of 0.) This would be the best way to reuse code IMO.

Follow HF and have separate methods for sampling and greedy. From an adoption perspective, this might be beneficial as users of torchtext tend to also have experience with HF; however, there would certainly be some duplicated code.

Thoughts? @yohann-benchetrit

I agree with @joecummings here in that by allowing users to provide top_k or top_p this method is no longer implementing a simple greedy search. In terms of which option to go with, I think for reusability, going for option 1 makes sense. However, I also think that it's more intuitive for users to be able to call a greedy_search method without having to worry about modifying the top_k or top_p params or spend time understanding what sample means? Most NLP users will have heard of beam_search or greedy_search but sample might not be as intuitive.

An alternative proposal here could be to keep the greedy_search method as is and then add the top_k and top_p params only to the beam_search method. This way if users want to do greedy search with those 2 params set, they can just set the beam_size to 1. For this approach, the greedy_search method will be kept simple and intuitive and all the configurability would come from the beam_search method.

joecummings commented 1 year ago

I agree with @joecummings here in that by allowing users to provide top_k or top_p this method is no longer implementing a simple greedy search. In terms of which option to go with, I think for reusability, going for option 1 makes sense. However, I also think that it's more intuitive for users to be able to call a greedy_search method without having to worry about modifying the top_k or top_p params or spend time understanding what sample means? Most NLP users will have heard of beam_search or greedy_search but sample might not be as intuitive.

An alternative proposal here could be to keep the greedy_search method as is and then add the top_k and top_p params only to the beam_search method. This way if users want to do greedy search with those 2 params set, they can just set the beam_size to 1. For this approach, the greedy_search method will be kept simple and intuitive and all the configurability would come from the beam_search method.

@Nayef211 We abstract away the internal methods, so high-level users should never have to call greedy_search or beam_search directly. I see your point though; however, I don't think we should leave complexity to one function or the other. If we really want this to be completely composable and apply to all cases of sampling (which can happen in beam search, too), we may want to combine these methods into one. This seems beyond the scope of this PR though as beam_search is not implemented on main right now.

yohann-benchetrit commented 1 year ago

Thoughts? @yohann-benchetrit

Agreed, thanks @joecummings and @Nayef211 for your comments !

So with this additional information my understanding is:

generate is the "maximum generality" decoding-interface
greedy_search and beam_search are (or will be) implemented on their own as public subroutines of generate but they are not necessarily intended to be directly called by the end-user.

My thoughts on this:

As @Nayef211 suggested and closer to your proposition 2 (HF), I would tend to also leave greedy_search sampling-free and delegate 'sampling with a single beam' (i.e what I implemented here) to a beam_sample method that will build on beam_search once it will be in main. However, since beam_search is not currently in main, does this mean that this PR should solely restrict to adding _get_top_{k,p}_restriction methods (with corresponding tests) ?
As a side note, although HF also recommends going solely for using generate, I like the idea of having the fundamentals such as greedy_search and beam_search on the top of the API, for "no-extra-thought" usage.

Nayef211 commented 1 year ago

As @Nayef211 suggested and closer to your proposition 2 (HF), I would tend to also leave greedy_search sampling-free and delegate 'sampling with a single beam' (i.e what I implemented here) to a beam_sample method that will build on beam_search once it will be in main. However, since beam_search is not currently in main, does this mean that this PR should solely restrict to adding _get_top_{k,p}_restriction methods (with corresponding tests) ?

I like this idea more and unless there are customers that are asking for sampling to be implemented in greedy_search, I would hold off on adding these params to the method and instead just add the helper methods like you mentioned. However if customers are asking for this, I don't see any harm in adding the functionality to greedy_search and removing it later down the line. Ultimately, I'll leave the decision to @joecummings.

yohann-benchetrit commented 1 year ago

@joecummings @Nayef211

Thanks again for your comments. I addressed them as follows:

I kept only the helper methods _get_top_{k,p}_restriction and deleted their usage in decoding (except for unit tests).
I added an _apply_temperature method (and a unit test), as Joe was suggesting, addressing issue #2138 .

pytorch / text

Add top-p and top-k sampling to GenerationUtils #2137