Open Zapotecatl opened 1 year ago
hi @Zapotecatl, the top_k is currently not implemented in ORT. I can add it to our backlog.
Thanks!
I'm not a fan of top_k personally. It seems a little artificial. But each to their own. I prefer just using temperature.
Describe the feature request
I have exported GPT NEO with the optimizer tool (it is necessary to slightly modify some parts of the code because it is not designed for GPT NEO).
python convert_generation.py -m EleutherAI/gpt-neo-1.3B --decoder_onnx D:\Gpt\GPT_NEO_ONNXRUNTIME\EleutherAI\gpt-neo-1.3B_GPTNeoForCausalLM_past_fp32.onnx --output D:\Gpt\GPT_NEO\NEO_SAMPLING\gpt_neo_beam_search.onnx --cache_dir D:\Gpt\GPT_NEO_ONNXRUNTIME\cache --use_external_data_format --num_beams 1 --top_p 1.0 --temperature=0.9
I am exploring the Sampling option in C++, however, the responses of gpt neo do not vary (see explanation in the scenario description).
My understanding is that the control of the variation is with the top_k parameter, but this is not present in the optimization tool.
Describe scenario use case
GPT models are stateless. I'm trying to provide a context for the character (Anna) to have a memory about her personality. So, I have this context file: initial_context.txt
This is the program in python using GPT NEO and apply sampling with top_k=50.
The answers are generally acceptable and mostly vary with the same input. A very desirable feature. For example:
If I change the value of top_k = 1, the answer will always be the same, it stops varying.
With GPT NEO optimized in C++ I always have the same response, that is, it behaves as if it had top_k = 1.