Closed marcosdotme closed 1 year ago
@marcosdotme Hi. This question might be better off on StackOverflow. Anyhow.
But it also don't work and don't make sense to me. Wheres the dataframe reference for column "gender" and "state"?
I cannot comment on making sense, but it definetly work.
is_men = (F.col("gender") == "M")
from_utah = (F.col("state") == "Utah")
df.withColumn("marker", F.when(is_men & from_utah, True).otherwise(False)).show()
results in
> +--------+------+------+------+
> | name|gender| state|marker|
> +--------+------+------+------+
> | James| M| Utah| Yes|
> | Michael| M|Oregon| No|
> | Maria| F| Utah| No|
> |Jennifer| F|Oregon| No|
> | Robert| M| Utah| Yes|
> +--------+------+------+------+
Check out the type of is_men
type(is_men)
> pyspark.sql.column.Column
Not actually a data item, while, your first formulation of your two objects where returning actual dataframes!
is_men = df.filter(F.col("gender") == "M")
type(is_men)
> pyspark.sql.dataframe.DataFrame
Thank you so much for the reply @cheTesta! I was really trying to do something wrong due to a lack of knowledge on PySpark.
I will close this issue.
I'm trying to reproduce a similar code to reduce the complexity of some logical clauses that I have in my code, but I didn't understand very well. Can someone give me help?
Based on this dataset, I need to create a column called "target", our target are all men from Utah.
What I tried to do:
This code above didn't work and raised an error:
Following exactly the example in "Refactor complex logical operations" section, the code must be something like this:
But it also don't work and don't make sense to me. Wheres the dataframe reference for column "gender" and "state"?