We are using the AnalyseHealthText Cognitive Services transformer, and have been up to now passing a single subscription key via the subscriptionKey construction argument with no problems.
After some throttling issues, we've needed to fan out across multiple accounts so have implemented the following code (shortened for brevity):
COGNITIVE_KEY_COLUMN = "cognitive_key"
def extract_features(df: DataFrame, table: TableConfig, cognitive_location: str) -> DataFrame:
enriched_df = df
for column in table["column_types"][ColumnType.FREE_TEXT]:
logging.debug(f"Extracting features from free-text column: {column}")
ta4h = AnalyzeHealthText(
textCol=column,
outputCol=f"{column}_Extracted",
errorCol=f"{column}_Extracted_Errors",
batchSize=10,
concurrency=8,
subscriptionKeyCol=COGNITIVE_KEY_COLUMN
).setLocation(cognitive_location)
# Run Text Analytics for Health to extract health entities/relationships
enriched_df = ta4h.transform(enriched_df)
return enriched_df.drop(COGNITIVE_KEY_COLUMN)
# Get secrets for cognitive services endpoint
cognitive_keys_string = spark_session.conf.get("spark.secret.cognitive-services-keys")
cognitive_location = spark_session.conf.get("spark.secret.cognitive-services-location")
# Split out multiple cognitive keys (if any) from secret (delimited by ';')
cognitive_keys = cognitive_keys_string.split(";")
@udf
def random_key():
return cognitive_keys[random.randint(0, len(cognitive_keys) - 1)]
for table_name, table_config in TABLE_CONFIG.items():
# Read from silver
df_input = read_delta_table(
spark_session, construct_uri(spark_session, DatalakeZone.SILVER, table_name)
)
# Add cognitive subscription keys to the DataFrame
df_with_keys = df_input.withColumn(COGNITIVE_KEY_COLUMN, random_key())
# Extract features to new column
df_output = extract_features(df_with_keys, table_config, cognitive_location)
# Write outputs to gold zone
write_delta_table(df_output, construct_uri(spark_session, DatalakeZone.GOLD, table_name))
Checking the DataFrame before sending through the transformer, I can see that a new column has been added successfully with the randomised subscription keys; however as soon as it's passed through transform we get the following exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2460.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2460.0 (TID 2787) (10.0.0.132 executor 0): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (`functions$$$Lambda$11463/414593102`: (struct<cognitive_key:array<string>,CancellationComment:array<string>>) => struct<requestLine:struct<method:string,uri:string,protocolVersion:struct<protocol:string,major:int,minor:int>>,headers:array<struct<name:string,value:string>>,entity:struct<content:binary,contentEncoding:struct<name:string,value:string>,contentLength:bigint,contentType:struct<name:string,value:string>,isChunked:boolean,isRepeatable:boolean,isStreaming:boolean>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:248)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at com.microsoft.azure.synapse.ml.io.http.HTTPTransformer.$anonfun$transform$2(HTTPTransformer.scala:128)
at org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:224)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:931)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:931)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
[...]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.lang.String
at com.microsoft.azure.synapse.ml.cognitive.HasCognitiveServiceInput.addHeaders(CognitiveServiceBase.scala:307)
at com.microsoft.azure.synapse.ml.cognitive.HasCognitiveServiceInput.addHeaders$(CognitiveServiceBase.scala:302)
at com.microsoft.azure.synapse.ml.cognitive.text.TextAnalyticsBaseNoBinding.addHeaders(TextAnalytics.scala:136)
at com.microsoft.azure.synapse.ml.cognitive.text.TextAnalyticsBaseNoBinding.$anonfun$inputFunc$1(TextAnalytics.scala:170)
at com.microsoft.azure.synapse.ml.io.http.CustomInputParser.$anonfun$setNullableUDF$1(Parsers.scala:132)
at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:8230)
... 71 more
As far as I can tell when preparing the query to cog services and extracting the cog services key in HasCognitiveServicesInput.addHeaders, its finding a WrappedArray and not a string and thus hitting the above error.
import random
from pyspark.sql.functions import udf
from synapse.ml.cognitive import AnalyzeHealthText
# Create a dataframe
text_df = spark.createDataFrame(
[
("The patient requires 20mg of Citalopram to be taken daily",),
("Exhibits symptoms of diabetes including fatigue, frequent urination and poor circulation",),
("Patient to be sent to the ER following complications from surgery",),
],
["Text"],
)
location = "uksouth"
keys = [
"key_1",
"key_2",
]
@udf
def random_key():
return keys[random.randint(0, len(keys) - 1)]
df_with_keys = text_df.withColumn("key", random_key())
ta4h = AnalyzeHealthText(
textCol="Text",
outputCol="Text_Extracted",
errorCol="Text_Extracted_Errors",
).setLocation(location)
results = ta4h.setSubscriptionKeyCol("key").transform(df_with_keys)
display(results)
Other info / logs
File ~/.ipykernel/6404/command--1-2129037883:13
10 del sys
12 with open(filename, "rb") as f:
---> 13 exec(compile(f.read(), filename, 'exec'))
File /tmp/tmpt49vp0kz.py:53
50 df_output = extract_features(df_with_keys, table_config, cognitive_location)
52 # Write outputs to gold zone
---> 53 write_delta_table(df_output, construct_uri(spark_session, DatalakeZone.GOLD, table_name))
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/[REDACTED]_pipeline/datalake.py:43, in write_delta_table(df, path)
35 def write_delta_table(df: DataFrame, path: str) -> None:
36 """Write Datalake Delta table within a specified zone (in overwrite mode).
37
38 Args:
(...)
41 path (str): Path to file inside Azure storage.
42 """
---> 43 df.write.format("delta").mode("overwrite").save(path)
File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs)
46 start = time.perf_counter()
47 try:
---> 48 res = func(*args, **kwargs)
49 logger.log_success(
50 module_name, class_name, function_name, time.perf_counter() - start, signature
51 )
52 return res
File /databricks/spark/python/pyspark/sql/readwriter.py:1463, in DataFrameWriter.save(self, path, format, mode, partitionBy, **options)
1461 self._jwrite.save()
1462 else:
-> 1463 self._jwrite.save(path)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
186 def deco(*a: Any, **kw: Any) -> Any:
187 try:
--> 188 return f(*a, **kw)
189 except Py4JJavaError as e:
190 converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o545.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2460.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2460.0 (TID 2787) (10.0.0.132 executor 0): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (`functions$$$Lambda$11463/414593102`: (struct<cognitive_key:array<string>,CancellationComment:array<string>>) => struct<requestLine:struct<method:string,uri:string,protocolVersion:struct<protocol:string,major:int,minor:int>>,headers:array<struct<name:string,value:string>>,entity:struct<content:binary,contentEncoding:struct<name:string,value:string>,contentLength:bigint,contentType:struct<name:string,value:string>,isChunked:boolean,isRepeatable:boolean,isStreaming:boolean>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:248)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at com.microsoft.azure.synapse.ml.io.http.HTTPTransformer.$anonfun$transform$2(HTTPTransformer.scala:128)
at org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:224)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:931)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:931)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:196)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:181)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:146)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:125)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:146)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$8(Executor.scala:897)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1681)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:900)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:795)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.lang.String
at com.microsoft.azure.synapse.ml.cognitive.HasCognitiveServiceInput.addHeaders(CognitiveServiceBase.scala:307)
at com.microsoft.azure.synapse.ml.cognitive.HasCognitiveServiceInput.addHeaders$(CognitiveServiceBase.scala:302)
at com.microsoft.azure.synapse.ml.cognitive.text.TextAnalyticsBaseNoBinding.addHeaders(TextAnalytics.scala:136)
at com.microsoft.azure.synapse.ml.cognitive.text.TextAnalyticsBaseNoBinding.$anonfun$inputFunc$1(TextAnalytics.scala:170)
at com.microsoft.azure.synapse.ml.io.http.CustomInputParser.$anonfun$setNullableUDF$1(Parsers.scala:132)
at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:8230)
... 71 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3555)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3487)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3476)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3476)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1493)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1493)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1493)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3801)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3713)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3701)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1217)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1205)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2946)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2929)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:399)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:363)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:396)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:281)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:116)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:109)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$3(commands.scala:132)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:130)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:129)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$doExecute$4(commands.scala:156)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:156)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$2(SparkPlan.scala:274)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:274)
at org.apache.spark.sql.execution.SparkPlan$.org$apache$spark$sql$execution$SparkPlan$$withExecuteQueryLogging(SparkPlan.scala:107)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:332)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:328)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:269)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:413)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:412)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$13(TransactionalWriteEdge.scala:662)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:274)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:498)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:201)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1113)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:151)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:447)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$1(TransactionalWriteEdge.scala:652)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:196)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:183)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:160)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:265)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:263)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:159)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:571)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:666)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:684)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:426)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:196)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:424)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:25)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:470)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:25)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:661)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:580)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:25)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:571)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:540)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:25)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:158)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:148)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:138)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:344)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:2027)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:343)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:376)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:370)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:727)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:717)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:242)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:239)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:155)
at com.databricks.sql.transaction.tahoe.commands.ClusteredWriter.run(ClusteredWriter.scala:96)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.writeFiles(WriteIntoDelta.scala:491)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:413)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:106)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:270)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:106)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:2027)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:105)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:204)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$1(commands.scala:82)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:79)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:272)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:166)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:272)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:274)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:498)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:201)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1113)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:151)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:447)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:271)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:266)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:251)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:465)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:465)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:33)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:316)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:312)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:441)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:251)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:372)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:251)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:203)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:200)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:336)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:956)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:424)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:333)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:250)
at sun.reflect.GeneratedMethodAccessor540.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (`functions$$$Lambda$11463/414593102`: (struct<cognitive_key:array<string>,CancellationComment:array<string>>) => struct<requestLine:struct<method:string,uri:string,protocolVersion:struct<protocol:string,major:int,minor:int>>,headers:array<struct<name:string,value:string>>,entity:struct<content:binary,contentEncoding:struct<name:string,value:string>,contentLength:bigint,contentType:struct<name:string,value:string>,isChunked:boolean,isRepeatable:boolean,isStreaming:boolean>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:248)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at com.microsoft.azure.synapse.ml.io.http.HTTPTransformer.$anonfun$transform$2(HTTPTransformer.scala:128)
at org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:224)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:931)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:931)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:407)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:404)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:371)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:196)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:181)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:146)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:125)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:146)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$8(Executor.scala:897)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1681)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:900)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:795)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.lang.String
at com.microsoft.azure.synapse.ml.cognitive.HasCognitiveServiceInput.addHeaders(CognitiveServiceBase.scala:307)
at com.microsoft.azure.synapse.ml.cognitive.HasCognitiveServiceInput.addHeaders$(CognitiveServiceBase.scala:302)
at com.microsoft.azure.synapse.ml.cognitive.text.TextAnalyticsBaseNoBinding.addHeaders(TextAnalytics.scala:136)
at com.microsoft.azure.synapse.ml.cognitive.text.TextAnalyticsBaseNoBinding.$anonfun$inputFunc$1(TextAnalytics.scala:170)
at com.microsoft.azure.synapse.ml.io.http.CustomInputParser.$anonfun$setNullableUDF$1(Parsers.scala:132)
at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:8230)
... 71 more
What component(s) does this bug affect?
[X] area/cognitive: Cognitive project
[ ] area/core: Core project
[ ] area/deep-learning: DeepLearning project
[ ] area/lightgbm: Lightgbm project
[ ] area/opencv: Opencv project
[ ] area/vw: VW project
[ ] area/website: Website
[ ] area/build: Project build system
[ ] area/notebooks: Samples under notebooks folder
[ ] area/docker: Docker usage
[ ] area/models: models related issue
What language(s) does this bug affect?
[ ] language/scala: Scala source code
[X] language/python: Pyspark APIs
[ ] language/r: R APIs
[ ] language/csharp: .NET APIs
[ ] language/new: Proposals for new client languages
Hey @jjgriff93 :wave:!
Thank you so much for reporting the issue/feature request :rotating_light:.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.
SynapseML version
0.11.2
System information
Describe the problem
We are using the
AnalyseHealthText
Cognitive Services transformer, and have been up to now passing a single subscription key via thesubscriptionKey
construction argument with no problems.After some throttling issues, we've needed to fan out across multiple accounts so have implemented the following code (shortened for brevity):
Checking the DataFrame before sending through the transformer, I can see that a new column has been added successfully with the randomised subscription keys; however as soon as it's passed through transform we get the following exception:
As far as I can tell when preparing the query to cog services and extracting the cog services key in
HasCognitiveServicesInput.addHeaders
, its finding a WrappedArray and not a string and thus hitting the above error.Code to reproduce issue
Derived from https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/AI%20Services/Advanced%20Usage%20-%20Async,%20Batching,%20and%20Multi-Key/#step-5-multi-key
Other info / logs
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrations