isdsucph / isds2021

Introduction to Social Data Science 2021 - a summer school course https://isdsucph.github.io/isds2021/
MIT License
22 stars 37 forks source link

Boolean row selection #18

Closed johankll closed 3 years ago

johankll commented 3 years ago

How does the boolean row selection work? I can't figure out why i end up with row 0 and 3 in the below cells:

image

joachimkrasmussen commented 3 years ago

Hi Johan,

The short answer is that this is simply how pandas works with []. In general, when you write df2 = df[condition], then df2 will contain the part of the dataframe where the statement was true.

For a more complete answer, consider reading the section "Boolean Indexing" in the PDA textbook that we use in this course.

Was this useful?

Best, Joachim

johankll commented 3 years ago

Hi Joachim,

Partly. What i meant was: Why are the rows [0,3] kept while rows [1,2] are not kept?

I understand that the strings in the [] serve as a sort filtration, but i don't understand how this particular filtration works. I don't understand why some rows "obey" the requirement [True, False, False, True] while others don't. In my understanding string(x) will be True for any x != 0 and False for any x = 0.

joachimkrasmussen commented 3 years ago

Hi Johan,

Think about it this way: What is inside [] is not a set of strings. It is basically a list with 4 boolean values. Boolean variables/data can only have two different values: True or False. Hence, it is binary. True fundamentally means 'switch on' while False means 'switch off'. When you write df[[True, False, False, True]], you simply tell pandas to drow the rows with index 1 and 2, and keep those with index 0 and 3. This is just how pandas works.

Does this non-technical explanation make sense?

Best, Joachim

johankll commented 3 years ago

Thanks, Joachim. It makes sense now.