SynapseML version

0.11.2-spark3.3

System information

Language version: python 3.10
Spark Version: 3.3
Spark Platform: Synapse

Describe the problem

Referring to: https://github.com/microsoft/SynapseML/blob/0836e40efd9c48424e91aa10c8aa3fbf0de39f31/cognitive/src/main/scala/com/microsoft/azure/synapse/ml/cognitive/openai/OpenAIChatCompletion.scala#L31

Documentation says messages is of type: Array(Struct(role: String, content: String))

However, if I don't add a "name" field and populate it with the deployment name, completion.transform(df) crashes because it can't find the "name" field.

Code to reproduce issue

import synapse.ml.cognitive.openai.OpenAIChatCompletion as OpenAIChatCompletion from synapse.ml.core.platform import find_secret import pyspark.sql.types as T

key = find_secret(...) deployment_name = "openai-xxx-gpt-4-xxx" service_name = "openai-xxx" user = "xxx"

Documentation says: Array(Struct(role: String, content: String)),

but if I do that it fails with cryptic error; "name" (deployment) is needed

msg_works = [ {"role": "system", "content": "You can only answer with 'yes' or 'no'.", "name": deployment_name}, {"role": "assistant", "content": "Can you read this?", "name": deployment_name} ] msg_fails = [ {"role": "system", "content": "You can only answer with 'yes' or 'no'."}, {"role": "assistant", "content": "Can you read this?"} ]

df = sc.parallelize([[msg_fails]], 1).toDF( T.StructType([ T.StructField("messages", T.ArrayType( T.StructType([ T.StructField("role", T.StringType(), False), T.StructField("content", T.StringType(), False),

T.StructField("name", T.StringType(), False) # add to make it work

        ]), False
    ), False)
])

)

completion = ( OpenAIChatCompletion() .setSubscriptionKey(key) .setDeploymentName(deployment_name) .setUrl(f"https://{service_name}.openai.azure.com/") .setMessagesCol("messages") .setMaxTokens(100) .setTemperature(0.25) .setFrequencyPenalty(0.0) .setPresencePenalty(0.0) .setUser(user) .setErrorCol("error") .setOutputCol("response") )

response = completion.transform(df) display(response)

Other info / logs

No response

What component(s) does this bug affect?

[X] area/cognitive: Cognitive project
[ ] area/core: Core project
[ ] area/deep-learning: DeepLearning project
[ ] area/lightgbm: Lightgbm project
[ ] area/opencv: Opencv project
[ ] area/vw: VW project
[ ] area/website: Website
[ ] area/build: Project build system
[ ] area/notebooks: Samples under notebooks folder
[ ] area/docker: Docker usage
[ ] area/models: models related issue

What language(s) does this bug affect?

[ ] language/scala: Scala source code
[X] language/python: Pyspark APIs
[ ] language/r: R APIs
[ ] language/csharp: .NET APIs
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[X] integrations/synapse: Azure Synapse integrations
[ ] integrations/azureml: Azure ML integrations
[ ] integrations/databricks: Databricks integrations

microsoft / SynapseML

[BUG] Synapse GPT-4, OpenAIChatCompletion, API documentation: mandatory "name" field not mentioned in documentation for "messages" #2115