Hwo to get log probablity of input tokens?

openai / openai-python

The official Python library for the OpenAI API

https://pypi.org/project/openai/

Apache License 2.0

22.05k stars 3.04k forks source link

Hwo to get log probablity of input tokens? #1463

Closed BUAADreamer closed 3 months ago

BUAADreamer commented 3 months ago

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

[X] This is a feature request for the Python library

Describe the feature or improvement you're requesting

For a complete context: Q:Where is Beijing? A:china I want to get the log_probs of china, how can I do this?

Additional context

No response

meorphis commented 3 months ago

Hey @BUAADreamer - you can follow the guide here on how to use logprobs: https://cookbook.openai.com/examples/using_logprobs#1-using-logprobs-to-assess-confidence-for-classification-tasks

The example presented in this article is quite similar to your use case.

BUAADreamer commented 3 months ago

I see, but what I want is the prob of tokens appeared in this history but not new generated tokens

meorphis commented 3 months ago

Hmm are you saying that you want probabilities for tokens that you passed to the model as input? That is not supported - the model can only give you probabilities for its own predictions.

BUAADreamer commented 3 months ago

Thanks! However, I found REPLUG, a great RAG work, use similar methods to obtain token probs, you can find here: https://github.com/swj0419/REPLUG/blob/6cbe7971a9b761c63aa65f7fe70db3ee82158612/LSR_finetune/replug_lsr.py#L185 But this is used for get GPT-3 or CodeX output, so this way is now invalid for latest GPT3.5/GPT4? If yes, will you reopen this usage in the future?

meorphis commented 3 months ago

Ah interesting. Yes you would not be able to plug some of the newer models (4o, 4-turbo, 3.5-turbo) into this library because the legacy completions API used here is not compatible with those models. We do not plan to add support for these models to the legacy completions API.

I can see how the feature implemented in this library could be useful, but it's a bit out of our wheelhouse of the maintainers of this GitHub repo, which is just a Python wrapper around the OpenAI API. We can answer questions about and resolve problems with the Python SDK, but I'm afraid I can't comment particularly intelligently on functionality that was implemented as part of a third party library.

BUAADreamer commented 3 months ago

Thanks for such a complete and quick answer!