microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.04k stars 830 forks source link

[BUG] Synapse GPT-4, OpenAIChatCompletion, API documentation: mandatory "name" field not mentioned in documentation for "messages" #2115

Open EPMSFT opened 10 months ago

EPMSFT commented 10 months ago

SynapseML version

0.11.2-spark3.3

System information

Describe the problem

Referring to: https://github.com/microsoft/SynapseML/blob/0836e40efd9c48424e91aa10c8aa3fbf0de39f31/cognitive/src/main/scala/com/microsoft/azure/synapse/ml/cognitive/openai/OpenAIChatCompletion.scala#L31

Documentation says messages is of type: Array(Struct(role: String, content: String))

However, if I don't add a "name" field and populate it with the deployment name, completion.transform(df) crashes because it can't find the "name" field.

Code to reproduce issue

import synapse.ml.cognitive.openai.OpenAIChatCompletion as OpenAIChatCompletion from synapse.ml.core.platform import find_secret import pyspark.sql.types as T

key = find_secret(...) deployment_name = "openai-xxx-gpt-4-xxx" service_name = "openai-xxx" user = "xxx"

Documentation says: Array(Struct(role: String, content: String)),

but if I do that it fails with cryptic error; "name" (deployment) is needed

msg_works = [ {"role": "system", "content": "You can only answer with 'yes' or 'no'.", "name": deployment_name}, {"role": "assistant", "content": "Can you read this?", "name": deployment_name} ] msg_fails = [ {"role": "system", "content": "You can only answer with 'yes' or 'no'."}, {"role": "assistant", "content": "Can you read this?"} ]

df = sc.parallelize([[msg_fails]], 1).toDF( T.StructType([ T.StructField("messages", T.ArrayType( T.StructType([ T.StructField("role", T.StringType(), False), T.StructField("content", T.StringType(), False),

T.StructField("name", T.StringType(), False) # add to make it work

        ]), False
    ), False)
])

)

completion = ( OpenAIChatCompletion() .setSubscriptionKey(key) .setDeploymentName(deployment_name) .setUrl(f"https://{service_name}.openai.azure.com/") .setMessagesCol("messages") .setMaxTokens(100) .setTemperature(0.25) .setFrequencyPenalty(0.0) .setPresencePenalty(0.0) .setUser(user) .setErrorCol("error") .setOutputCol("response") )

response = completion.transform(df) display(response)

Other info / logs

No response

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions[bot] commented 10 months ago

Hey @EPMSFT :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.