salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Python interface #393

Open tovbinm opened 5 years ago

tovbinm commented 5 years ago

Problem TransmogrifAI is currently only usable from Scala with Spark. It would be great if one could:

  1. Load TransmogrifAI models in Python, display model insights and compare/evaluate with other ml libraries
  2. Define TransmogrifAI workflows from Python, train and save the model
  3. Define new stages through UDFs from Python

Solution Let's discuss the plan on how we would like to prioritize and implement it.

Additional Context We would need to think about how we standardize model format (perhaps double down on MLeap?) for (1). For (2) we would need to pick which stages we want to expose first and

KobaKhit commented 4 years ago

Is this still on the roadmap? I would love to contribute. Just need a starting point.

tovbinm commented 4 years ago

Yeah, we were planning to add it - #394 . Mot likely nobody is working on it yet. @wsuchy @leahmcguire please confirm.

UnixJunkie commented 4 years ago

It would be amazing if the whole thing, including model training, would be usable from Python.

PS: if you make it usable from OCaml, I would also be happy, but that probably makes less new happy users than Python wrappers

tovbinm commented 4 years ago

@leahmcguire @gerashegalov correct me if I am wrong, but I don't think anyone is working on neither Pything nor OCaml interface this an this point. That's mainly due to the fact that TransmogrifAI is being used from Scala codebase & notebooks internally.