Closed paulooctavio closed 1 month ago
@paulooctavio Thank you for the contribution! Can you, please, reopen thir PR with a target planning-1.0-release
, not main? Because it is a breaking change and I think we should make it part of the planning 1.0 release instead of pushing to the current main
LGTM!
@SemyonSinchenko - what part of this PR is breaking?
LGTM!
@SemyonSinchenko - what part of this PR is breaking? @MrPowers
It seems to me that extensions won't work with the change.
Which ones exactly? Also, shouldn't we have some kind of test to account for breaking changes?
Which ones exactly? Also, shouldn't we have some kind of test to account for breaking changes?
It seems to me that because you removed import *
it won't actually execute the line DataFrame.transform = getattr(DataFrame, "transform", _ext_function)
and won't patch a DataFrame
class.
@SemyonSinchenko How exactly can I check that? I tested it using the code below and it seems to work.
Test code:
from pyspark.sql import SparkSession, DataFrame
import warnings
# Define _ext_function (as from the original dataframe_ext.py file)
def _ext_function(spark, f):
warnings.warn(
"Extensions may be removed in future versions. Please use explicit functions instead.",
category=DeprecationWarning,
stacklevel=2,
)
return f(spark)
# Check if DataFrame already has a transform method
if hasattr(DataFrame, 'transform'):
print("Removing existing 'transform' method from DataFrame for testing purposes.")
delattr(DataFrame, 'transform')
# Now patch the DataFrame class with the transform method using _ext_function
DataFrame.transform = getattr(DataFrame, "transform", _ext_function)
# Initialize Spark session
spark = SparkSession.builder.master("local").appName("TestSession").getOrCreate()
# Create a simple DataFrame
data = [(1, "Alice"), (2, "Bob")]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)
# Check if transform method exists after patch
if hasattr(df, 'transform'):
print(f"After patch: 'transform' method exists on DataFrame")
# Test invoking the patched transform method
def dummy_transform(spark):
print("Transform function works!")
return df
df.transform(dummy_transform)
else:
print("Patch failed, transform method not found on DataFrame.")
Output:
Removing existing 'transform' method from DataFrame for testing purposes.
24/09/16 20:39:29 WARN Utils: Your hostname, ... resolves to a loopback address: ...; using ... instead (on interface lo)
24/09/16 20:39:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/09/16 20:39:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
After patch: 'transform' method exists on DataFrame
/home/paulooctavio/projects/quinn/try.py:38: DeprecationWarning: Extensions may be removed in future versions. Please use explicit functions instead.
df.transform(dummy_transform)
Transform function works!
@paulooctavio Sorry, my fault. I was thinking that the patching won't work with that change. I checked, it works fine. Thank you for your patience!
Proposed changes
This PR tackles issue #75 by eliminating wildcard imports (
import *
) from both codebase and documentation.Types of changes
What types of changes does your code introduce to quinn? Put an
x
in the boxes that apply