Spark - Githubissues

guptaat commented 3 years ago

Existing Spark session not recognized by optimus

A clear and concise description of what the bug is.

To Reproduce Run the following code.

from pyspark.sql import SparkSession spark1 = SparkSession.builder.appName('abcs').getOrCreate()

df=spark1.read.csv(r'C:\Users\Downloads\foo.csv',header=True)

op= Optimus(spark1)

Expected behavior existing spark dataframes are expected to have optimus functionality. Instead optimus throws an error indicating engine must be text word spark and not spark object session.

Screenshots

environment Latest version from github 21.9 branch.

Additional context Follow up to an issue that was closed some time back https://github.com/hi-primus/optimus/issues/645

argenisleon commented 3 years ago

Hi @guptaat ,

Can you please try the 2.2.32 version using 'pip install optimuspyspark'? The actual branch get a heavy refractory and some function needs extra work to make it compatible with 2.2.23 You can find documentation in the readme section https://github.com/hi-primus/optimus/tree/branch-2.2 Please let me know if it helps.

Amangoel998 commented 2 years ago

Hi @argenisleon, Can you tell if the issue is fixed in new release or 2.2.32 version code has been merged to latest release?

argenisleon commented 2 years ago

Hi @Amangoel998,

In the latest release, you can:

Optimus("spark", session=your_spark_session)

Please, let me know if it helps

hi-primus / optimus

Spark #1213