palantir / pyspark-style-guide

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
MIT License
987 stars 123 forks source link

Recommend single filter expressions rather than chaining #10

Open blakehawkins opened 2 years ago

blakehawkins commented 2 years ago

@asmello as discussed, it's better style to write complex filters:

df.where(F.col('pokemon').isNull() & ~F.col('cards').isNull())

Rather than chain filters:

df.where(F.col('pokemon').isNull()).filter(~F.col('cards').isNull())

(Taking into account https://github.com/palantir/pyspark-style-guide#refactor-complex-logical-operations)