Open promisinganuj opened 5 months ago
Please refer to https://github.com/microsoft/LightGBM/issues/6492 which was initially raised with LightGBM initially.
Tagging on to this; I am seeing the same error in the same Fabric Spark runtime. \EDIT - on runtime 1.1 it works for me.
I've got the same issue when running the following script on Synapse studio %%configure -f { "name": "synapseml", "conf": { "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.4", "spark.jars.repositories": "https://mmlspark.azureedge.net/maven", "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind", "spark.yarn.user.classpath.first": "true", "spark.sql.parquet.enableVectorizedReader": "false" } }
train, test = ( spark.read.parquet( "wasbs://publicwasb@mmlspark.blob.core.windows.net/BookReviewsFromAmazon10K.parquet" ) .limit(1000) .cache() .randomSplit([0.8, 0.2]) )
display(train)
from pyspark.ml import Pipeline from synapse.ml.featurize.text import TextFeaturizer from synapse.ml.lightgbm import LightGBMRegressor
model = Pipeline( stages=[ TextFeaturizer(inputCol="text", outputCol="features"), LightGBMRegressor(featuresCol="features", labelCol="rating"), ] ).fit(train)
--Updated-- It works if we use Spark 3.3 pools %%configure -f { "name": "synapseml", "conf": { "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3", "spark.jars.repositories": "https://mmlspark.azureedge.net/maven", "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind", "spark.yarn.user.classpath.first": "true", "spark.sql.parquet.enableVectorizedReader": "false" } }
We're running into the same issue with this configuration from Azure Synapse Spark 3.4 pool.
Hi All, thank you for your patience. We have tried to get a simple repro of this and found that its something funky is going on with the natives.
...
.setDataTransferMode("bulk")
To work around this for now
Hi All, thank you for your patience. We have tried to get a simple repro of this and found that its something funky is going on with the natives.
... .setDataTransferMode("bulk")
To work around this for now
Confirm this works in Synapse Spark 3.4 pool. Thank you @mhamilton723 for the mitigation.
Can you elaborate how to use the workaround. Thank you!
@Jens-automl - it's just a function on the LightGBMClassifier
https://mmlspark.blob.core.windows.net/docs/1.0.4/pyspark/synapse.ml.lightgbm.html
SynapseML version
1.0.4
System information
Runtime 1.2
Describe the problem
I am trying the following tutorial in a Microsoft Fabric notebook: https://learn.microsoft.com/en-us/fabric/data-science/how-to-use-lightgbm-with-synapseml
The Step 5 of this sample is failing:
Here is the excerpt of the error:
Code to reproduce issue
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrations