microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.02k stars 829 forks source link

[LightGBM] can not run mmlspark in private cluster #962

Open robscc opened 3 years ago

robscc commented 3 years ago

Describe the bug when i run the sample code error happens

triazines = spark.read.format("libsvm").load("hdfs://ai/user/spark/tmp/svmlight.svmlight")
print("records read: " + str(triazines.count()))
print("Schema: ")
triazines.printSchema()

from mmlspark import LightGBMRegressor
train, test = triazines.randomSplit([0.85, 0.15], seed=1)
model = LightGBMRegressor(objective='quantile',
                        alpha=0.2,
                        learningRate=0.3,
                        numLeaves=31).fit(train)

To Reproduce

Expected behavior

Info (please complete the following information):

Stacktrace

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/app/spark-2.4.5-bin-hadoop2.7/python/pyspark/ml/base.py", line 132, in fit
    return self._fit(dataset)
  File "/app/spark-2.4.5-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 295, in _fit
    java_model = self._fit_java(dataset)
  File "/app/spark-2.4.5-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 292, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/app/spark-2.4.5-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/app/spark-2.4.5-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/app/spark-2.4.5-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o81.fit.
: java.lang.NoClassDefFoundError: Lcom/microsoft/ml/lightgbm/SWIGTYPE_p_void;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
    at java.lang.Class.getDeclaredField(Class.java:2068)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1703)
    at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:484)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
    at java.io.ObjectOutputStream.writeClass(ObjectOutputStream.java:1213)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1120)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:820)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:819)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
    at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:819)
    at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3049)
    at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3047)
    at org.apache.spark.sql.Dataset$$anonfun$reduce$1.apply(Dataset.scala:1649)
    at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3361)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
    at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3357)
    at org.apache.spark.sql.Dataset.reduce(Dataset.scala:1648)
    at com.microsoft.ml.spark.LightGBMBase$class.train(LightGBMBase.scala:51)
    at com.microsoft.ml.spark.LightGBMRegressor.train(LightGBMRegressor.scala:35)
    at com.microsoft.ml.spark.LightGBMRegressor.train(LightGBMRegressor.scala:35)
    at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
    at org.apache.spark.ml.Predictor.fit(Predictor.scala:82)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.microsoft.ml.lightgbm.SWIGTYPE_p_void
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 58 more

Additional context how i run mmlspark

/app/spark-2.4.5-bin-hadoop2.7/bin/pyspark --queue root.mlalg --master yarn --deploy-mode client --conf spark.ui.port=7064 --executor-memory 15G --driver-memory 5G --executor-cores 5 --num-executors 5 --jars mmlspark/lightgbmlib-2.2.300.jar --jars mmlspark/mmlspark-0.17.jar --py-files mmlspark/mmlspark-0.17-py2.7.egg
welcome[bot] commented 3 years ago

šŸ‘‹ Thanks for opening your first issue here! If you're reporting a šŸž bug, please make sure you include steps to reproduce it.

BioQwer commented 3 years ago

i have same problem

imatiach-msft commented 3 years ago

hi @BioQwer and @robscc it looks like there is some issue with loading the native lightgbm jar: --jars mmlspark/lightgbmlib-2.2.300.jar Although you specified this here, the error suggests it somehow didn't get loaded.

BioQwer commented 3 years ago

@imatiach-msft how are you get this version? i don't see this version in microsoft/LightGBM/tags image

imatiach-msft commented 3 years ago

@BioQwer I publish this jar to maven after building from source to jar from lightgbm package on github: https://mvnrepository.com/artifact/com.microsoft.ml.lightgbm/lightgbmlib Note I maintain this jar on maven, it's not maintained by the lightgbm owner. The commits don't follow the lightgbm pypi releases exactly, but I try to keep the first version numbers relatively aligned. On my upgrade PRs I put the commit hashes I used: https://github.com/Azure/mmlspark/pull/1029

alenma04 commented 3 years ago

hi All, please find below colab link to install mmlspark in google colab. Thanks to @imatiach-msft https://colab.research.google.com/drive/1Fh91i442XsiFmxgFj0wZGyyFb9JxF-a9?usp=sharing

BioQwer commented 3 years ago

@alenma04 we talking about rinnign on private clusters

BioQwer commented 3 years ago

@imatiach-msft thank you for explanation