irudnyts / openai

An R package-wrapper around OpenAI API
https://irudnyts.github.io/openai/
Other
164 stars 28 forks source link

create_completion returns a corrupted list #40

Closed tts closed 11 months ago

tts commented 1 year ago

First of all, thanks for you effort in doing this library! Great work.

I'm trying to repeat the keyword example. Like it does in the sandbox, it should return a list of keywords extracted from the prompt. Note that in my code below, I've heavily truncated the prompt to save space here.

kw_res <- create_completion(
  model = "text-davinci-003",
  max_tokens = 60,
  temperature = 0.5,
  top_p = 1,
  frequency_penalty = 0.8,
  presence_penalty = 0,
  prompt = "Extract keywords from this text:\n\nKun näitä ilmastosuunnitelmia työnnetään kunnille ja kun kaikkien kuntien — ei nyt ihan kaikkien, mutta suurimman osan kunnista — rahat ovat aivan loppu"
)

The function runs without error, but the returned list is somehow corrupted. The keyword list is only a stub, and it is preceded with text that does not seem to be in the right place.

> kw_res$choices$text
[1] "ksen taholta tehdään jotain, niin se ei ole pelkkä näennäistoimintaa vaan se on sellaista, mikä todella vaikuttaa.\n\nKeywords: ilmast"

In the sandbox, with the same argument values, the function returns a dozen keywords or so.

I wonder if the issue is somehow caused by the OpenAI API itself?

irudnyts commented 1 year ago

Hi @tts, thanks for the message. Let me investigate this over the weekend 🙂 But it seems the issue is on the API end, since the package is just a wrapper around API endpoints.

tts commented 1 year ago

Thanks @irudnyts It just now occurred to me to test whether the character encoding might have something to do with this issue (always a good candidate!) Up until now I have only tried Finnish text with the function. And: yes, that seems to be the case! At least the very first test with English news text returned keywords without any oddities

kw_res <- create_completion(
  model = "text-davinci-003",
  max_tokens = 60,
  temperature = 0.5,
  top_p = 1,
  frequency_penalty = 0.8,
  presence_penalty = 0,
  prompt = "Extract keywords from this text:\n\nScientists at the University of British Columbia announced on Wednesday they had developed a new silica-based material with ability to absorb a wider range of the harmful chemicals, and new tools to break them apart them. This is very exciting because we can target these difficult-to-break chemical bonds – and break them for good,said researcher Madjid Mohseni, who focuses on water quality and water treatment.The chemicals, also known as PFAS (per-and polyfluoroalkyl substances) are used for non-stick or stain-resistant surfaces, including clothing, cookware, stain repellents and firefighting foam. But they are also notoriously difficult to break down naturally, giving them the name forever chemicals.")
> kw_res$choices$text
[1] "\n\nKeywords: Scientists, University of British Columbia, Silica-based material, Absorb, Harmful Chemicals, New Tools, Break Apart, Difficult-to-Break Chemical Bonds, PFAS (Per-and Polyfluoroalkyl Substances), Non-Stick Sur"

Which leaves me with the question: what next with Finnish? But - that's probably just my headache. Enjoy your weekend!

EDIT: Well, my system default is ISO-8859-1 so iconv(x, from = "ISO-8859-1", to = "UTF-8") is giving promising results.

irudnyts commented 1 year ago

@tts Sorry for not checking earlier -- I believed the issue was solved. Is that right?