palantir / pyspark-style-guide

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
MIT License
1.06k stars 135 forks source link

Conventions for PySpark dataframe typehints #12

Open harrietrs opened 2 years ago

harrietrs commented 2 years ago

When defining a function, it would be useful to follow a convention for PySpark DataFrame typehints, e.g.

from pyspark.sql import DataFrame
import pyspark.pandas as ps

def my_function(my_dataframe: DataFrame) -> ps.DataFrame:
    return my_dataframe.toPandas()

However the above doesn't clearly distinguish between the different data types. Perhaps an alias for the pyspark.sql.DataFrame is required- although I'm not sure of how to make it different from ps.DataFrame (an established alias).

fzhem commented 2 years ago

I have encountered this before and I do the following: from pyspark.sql import DataFrame as SparkDataFrame Maybe a bit descriptive but it works.