Closed cdoern closed 2 weeks ago
The issue here was the following:
_gen_kwargs
sets a default of 4096 for max_tokens
unless one is explicitly set in the pipeline yaml for the block.
This means I had to set the top level max_num_tokens
to 4096 rather than None
. We then use that 4096 value if no other was specified in the block and also use it if the user didn't pass an override.
@cdoern I may not quite be following the reasoning behind this change. Today, we set a max number of tokens for each step in the pipeline yamls. This allows the user to override what we specified in the yaml? Or this allows them to specify a value that gets used for any pipeline steps that don't have a max tokens specified?
@bbrowning more so This allows the user to override what we specified in the yaml
.
@aakankshaduggal indicated that this is a good path forward for allowing CLI users to experiment with different wait times/types of results from SDG and that she has used this toggle locally to achieve different types of results.
max_tokens
is already applied to each LLMBlock via gen_kwargs
and it set to 4096, this change maintains that unless max_num_tokens
is passed into generate_data
So, my only concern here is that we have downstream pipelines that apply a different max tokens for every step in the pipeline. And the general API surface area of the pipeline config is each step can specify it's own max tokens. This gives a single max tokens for the entire pipeline if passed in, right? As opposed to letting me just tweak the step(s) in the pipeline where I actually want to change the token count?
This pull request has merge conflicts that must be resolved before it can be merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
max-num-tokens is a nice way to run a shorter or longer SDG run. locally I have been modifiyng the pipeline yaml from 2048 to 512 which ends up just generating less data exposing this to the CLI could allow power users to run different types of SDG runs!