mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.
https://www.mage.ai/
Apache License 2.0
7.51k stars 688 forks source link

[BUG] Databricks integration is not working as documented #5240

Open richiesgr opened 3 weeks ago

richiesgr commented 3 weeks ago

Mage version

0.9.72

Describe the bug

After reading the documentation the pipeline type in file metadata.yaml cannot be set to databricks it's always return back to python.

In result there is no support to convert spark dataframe to pandas dataframe:

df_spark = spark.sql("select * table") return a spark dataframe generate a pickle errror

to make it work you need to explicitly convert to pandas df_spark = spark.sql("select * table") .toPandas() works

To reproduce

Expected behavior

Support databricks pipeline as documented Handle spark dataframe as expected return

Screenshots

Screenshot 2024-07-01 at 13 42 14

Operating system

Docker on macos

Additional context

No response

wangxiaoyou1993 commented 2 weeks ago

Databricks has updates on their library. So the guide is outdated. We'll update it when we get time.