microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.07k stars 831 forks source link

Parameter name equivalency with LightGBM -- snake_case vs camelCase #686

Open tbenthompson opened 5 years ago

tbenthompson commented 5 years ago

The parameters for mmlspark are written in camelCase whereas the LightGBM parameters are written in snake_case. See: https://lightgbm.readthedocs.io/en/latest/Parameters.html In addition, the parameter aliases mentioned on that page are not supported.

This adds an annoying step to migrating a project from using LightGBM to mmlspark. I understand the motivation to be consistent with typical Scala/Java conventions but it's not worth it here.

Fixing this would help adoption of this project a lot, moving the mmlspark API one step closer to being a drop-in replacement for the non-spark LightGBM API.

welcome[bot] commented 5 years ago

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

imatiach-msft commented 5 years ago

this seems like a really big change, but a great suggestion. I would really like to hear the thoughts of other users on this before making this change - if people could upvote/downvote that would be great. Also, are you using the pyspark API or the scala API? Maybe this might make sense for pyspark but the params should remain camelCase in scala?

imatiach-msft commented 5 years ago

specifically for pyspark the motivation was to be consistent with pyspark instead of python conventions

tbenthompson commented 5 years ago

I'm using the pyspark API. I'd advocate for allowing snake_case in both pyspark and Scala, but I care less about the Scala API. In the pyspark API, for backward compatibility with what mmlspark already does, you could just support either version of the parameter name in a parameters dictionary like the LightGBM API.

Thanks for being open to this!