microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.07k stars 831 forks source link

SynapseML Databricks Docker - 'JavaPackage' object is not callable #1940

Closed aman-solanki-kr closed 1 year ago

aman-solanki-kr commented 1 year ago

SynapseML version

0.10.2

System information

Describe the problem

I have a Docker Image attached to a Databricks workflow. One of the tasks in the workflow is to train a model using the LightGBMClassifier. Everything works fine if I install SynapseML using Maven from the Databricks User Interface, but I would like all the dependencies pre-installed into Docker container. Any suggestion to make this work is appreciated! Thank you!

Code to reproduce issue

from synapse.ml.lightgbm import LightGBMClassifier
import pyspark

spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
            .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.10.2") \
            .getOrCreate()
model = LightGBMClassifier()

Other info / logs

----> 1 model = LightGBMClassifier()

/databricks/spark/python/pyspark/__init__.py in wrapper(self, *args, **kwargs)
    112             raise TypeError("Method %s forces keyword arguments." % func.__name__)
    113         self._input_kwargs = kwargs
--> 114         return func(self, **kwargs)
    115     return wrapper
    116 

/databricks/python/lib/python3.8/site-packages/synapse/ml/lightgbm/LightGBMClassifier.py in __init__(self, java_obj, baggingFraction, baggingFreq, baggingSeed, binSampleCount, boostFromAverage, boostingType, catSmooth, categoricalSlotIndexes, categoricalSlotNames, catl2, chunkSize, dataRandomSeed, defaultListenPort, deterministic, driverListenPort, dropRate, dropSeed, earlyStoppingRound, executionMode, extraSeed, featureFraction, featureFractionByNode, featureFractionSeed, featuresCol, featuresShapCol, fobj, improvementTolerance, initScoreCol, isEnableSparse, isProvideTrainingMetric, isUnbalance, labelCol, lambdaL1, lambdaL2, leafPredictionCol, learningRate, matrixType, maxBin, maxBinByFeature, maxCatThreshold, maxCatToOnehot, maxDeltaStep, maxDepth, maxDrop, metric, microBatchSize, minDataInLeaf, minDataPerBin, minDataPerGroup, minGainToSplit, minSumHessianInLeaf, modelString, monotoneConstraints, monotoneConstraintsMethod, monotonePenalty, negBaggingFraction, numBatches, numIterations, numLeaves, numTasks, numThreads, objective, objectiveSeed, otherRate, parallelism, passThroughArgs, posBaggingFraction, predictDisableShapeCheck, predictionCol, probabilityCol, rawPredictionCol, repartitionByGroupingColumn, seed, skipDrop, slotNames, thresholds, timeout, topK, topRate, uniformDrop, useBarrierExecutionMode, useMissing, useSingleDatasetMode, validationIndicatorCol, verbosity, weightCol, xGBoostDartMode, zeroAsMissing)
    387         super(LightGBMClassifier, self).__init__()
    388         if java_obj is None:
--> 389             self._java_obj = self._new_java_obj("com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassifier", self.uid)
    390         else:
    391             self._java_obj = java_obj

/databricks/spark/python/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
     64             java_obj = getattr(java_obj, name)
     65         java_args = [_py2java(sc, arg) for arg in args]
---> 66         return java_obj(*java_args)
     67 
     68     @staticmethod

TypeError: 'JavaPackage' object is not callable

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions[bot] commented 1 year ago

Hey @aman-solanki-kr :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.

aman-solanki-kr commented 1 year ago

@mhamilton723 Any suggestions?

niehaus59 commented 1 year ago

Hi @aman-solanki-kr -

Please refer to the Dockerfiles in the demo and install directories under https://github.com/microsoft/SynapseML/tree/master/tools/docker

Please report back here on whether they help or not.

aman-solanki-kr commented 1 year ago

@niehaus59 I am using a databricksruntime base image and still experiencing the same issue.

niehaus59 commented 1 year ago

@aman-solanki-kr The SynapseML whl files house the python wrappers for the implementation, which is in the SynapseML jar file. It may be that the jar file is not being loaded properly. Can you send logs of the spark runtime startup?

ppruthi commented 1 year ago

@aman-solanki-kr -- awaiting your response.

aman-solanki-kr commented 1 year ago

@ppruthi @niehaus59 I was able to successfully resolve the issue.