hi-primus / optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
https://hi-optimus.com
Apache License 2.0
1.48k stars 232 forks source link

Spark #1213

Closed guptaat closed 2 years ago

guptaat commented 3 years ago

Existing Spark session not recognized by optimus

A clear and concise description of what the bug is.

To Reproduce Run the following code.

from pyspark.sql import SparkSession spark1 = SparkSession.builder.appName('abcs').getOrCreate()

df=spark1.read.csv(r'C:\Users\Downloads\foo.csv',header=True)

op= Optimus(spark1)

Expected behavior existing spark dataframes are expected to have optimus functionality. Instead optimus throws an error indicating engine must be text word spark and not spark object session.

Screenshots image

environment Latest version from github 21.9 branch.

Additional context Follow up to an issue that was closed some time back https://github.com/hi-primus/optimus/issues/645

argenisleon commented 3 years ago

Hi @guptaat ,

Can you please try the 2.2.32 version using 'pip install optimuspyspark'? The actual branch get a heavy refractory and some function needs extra work to make it compatible with 2.2.23 You can find documentation in the readme section https://github.com/hi-primus/optimus/tree/branch-2.2 Please let me know if it helps.

Amangoel998 commented 2 years ago

Hi @argenisleon, Can you tell if the issue is fixed in new release or 2.2.32 version code has been merged to latest release?

argenisleon commented 2 years ago

Hi @Amangoel998,

In the latest release, you can:

Optimus("spark", session=your_spark_session)

Please, let me know if it helps