palantir / pyspark-style-guide

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
MIT License
1.02k stars 131 forks source link

Conventions for PySpark dataframe typehints #12

Open harrietrs opened 1 year ago

harrietrs commented 1 year ago

When defining a function, it would be useful to follow a convention for PySpark DataFrame typehints, e.g.

from pyspark.sql import DataFrame
import pyspark.pandas as ps

def my_function(my_dataframe: DataFrame) -> ps.DataFrame:
    return my_dataframe.toPandas()

However the above doesn't clearly distinguish between the different data types. Perhaps an alias for the pyspark.sql.DataFrame is required- although I'm not sure of how to make it different from ps.DataFrame (an established alias).

fzhem commented 1 year ago

I have encountered this before and I do the following: from pyspark.sql import DataFrame as SparkDataFrame Maybe a bit descriptive but it works.