Open riley-harper opened 3 hours ago
We'll need to install the synapseml
Python package, which you can import as synapse.ml
. synapse.ml.lightgbm.LightGBMClassifier
seems to be the class that we need for Spark integration. Part of the setup for SynapseML includes downloading additional Spark jars. I added a few lines to hlink.spark.session in set_conf()
:
if os.path.isfile(jar_path):
conf = conf.set("spark.jars", jar_path)
+
+ conf.set("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.8")
+ conf.set("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
+
return conf
def local(self, cores=1, executor_memory="10G"):
At first this caused an error when I tried to create the Spark context. But after searching around for a solution, I cleaned out .ivy2 and .m2 in my home directory and it ran without issues. These additional configurations should probably be dependent on synapse.ml
being installed, so that users who aren't using LightGBM don't have to download them.
try:
import synapse.ml
except ModuleNotFoundError:
_synapse_ml_available = False
else:
_synapse_ml_available = True
...
if _synapse_ml_available:
conf.set("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.8")
conf.set("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
To get feature importances in training: https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#synapse.ml.lightgbm.mixin.LightGBMModelMixin.getFeatureImportances
In addition to XGBoost (#161), we would also like to add support for LightGBM. This should work similarly to XGBoost, since we'd also like to make LightGBM opt-in. From the documentation, it sounds like we'll need the SynapseML package to be able to run LightGBM on Spark.
To Do List