mrpowers-io / quinn

pyspark methods to enhance developer productivity 📣 👯 🎉
https://mrpowers-io.github.io/quinn/
Apache License 2.0
647 stars 99 forks source link

Remove import * from codebase and documentation #260

Closed paulooctavio closed 1 month ago

paulooctavio commented 2 months ago

Proposed changes

This PR tackles issue #75 by eliminating wildcard imports (import *) from both codebase and documentation.

Types of changes

What types of changes does your code introduce to quinn? Put an x in the boxes that apply

SemyonSinchenko commented 2 months ago

@paulooctavio Thank you for the contribution! Can you, please, reopen thir PR with a target planning-1.0-release, not main? Because it is a breaking change and I think we should make it part of the planning 1.0 release instead of pushing to the current main

MrPowers commented 2 months ago

LGTM!

@SemyonSinchenko - what part of this PR is breaking?

SemyonSinchenko commented 2 months ago

LGTM!

@SemyonSinchenko - what part of this PR is breaking? @MrPowers

It seems to me that extensions won't work with the change.

paulooctavio commented 2 months ago

Which ones exactly? Also, shouldn't we have some kind of test to account for breaking changes?

SemyonSinchenko commented 2 months ago

Which ones exactly? Also, shouldn't we have some kind of test to account for breaking changes?

It seems to me that because you removed import * it won't actually execute the line DataFrame.transform = getattr(DataFrame, "transform", _ext_function) and won't patch a DataFrame class.

paulooctavio commented 2 months ago

@SemyonSinchenko How exactly can I check that? I tested it using the code below and it seems to work.

Test code:

from pyspark.sql import SparkSession, DataFrame
import warnings

# Define _ext_function (as from the original dataframe_ext.py file)
def _ext_function(spark, f):
    warnings.warn(
        "Extensions may be removed in future versions. Please use explicit functions instead.",
        category=DeprecationWarning,
        stacklevel=2,
    )
    return f(spark)

# Check if DataFrame already has a transform method
if hasattr(DataFrame, 'transform'):
    print("Removing existing 'transform' method from DataFrame for testing purposes.")
    delattr(DataFrame, 'transform')

# Now patch the DataFrame class with the transform method using _ext_function
DataFrame.transform = getattr(DataFrame, "transform", _ext_function)

# Initialize Spark session
spark = SparkSession.builder.master("local").appName("TestSession").getOrCreate()

# Create a simple DataFrame
data = [(1, "Alice"), (2, "Bob")]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)

# Check if transform method exists after patch
if hasattr(df, 'transform'):
    print(f"After patch: 'transform' method exists on DataFrame")

    # Test invoking the patched transform method
    def dummy_transform(spark):
        print("Transform function works!")
        return df

    df.transform(dummy_transform)
else:
    print("Patch failed, transform method not found on DataFrame.")

Output:

Removing existing 'transform' method from DataFrame for testing purposes.
24/09/16 20:39:29 WARN Utils: Your hostname, ... resolves to a loopback address: ...; using ... instead (on interface lo)
24/09/16 20:39:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/09/16 20:39:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
After patch: 'transform' method exists on DataFrame
/home/paulooctavio/projects/quinn/try.py:38: DeprecationWarning: Extensions may be removed in future versions. Please use explicit functions instead.
  df.transform(dummy_transform)
Transform function works!
SemyonSinchenko commented 2 months ago

@paulooctavio Sorry, my fault. I was thinking that the patching won't work with that change. I checked, it works fine. Thank you for your patience!