pkremp / polls

161 stars 45 forks source link

Function proposals[which(proposals[, trump_states] > .5)] does not assign NA to correct indices. #13

Open saeranv opened 4 years ago

saeranv commented 4 years ago

@pkremp

I am a R newbie, so apologies in advance if this is not really an error. But I believe this particular operation, which is attempting to assign NA to all trump_states that are greater than 0.5 is not working as intended.

https://github.com/pkremp/polls/blob/3c7b9287cd83092ddb2d90f65006d853e626524d/update_prob.R#L47

Specifically, the R which operation returns a flat list of indices corresponding to the row of each column, but not the column indices, when working with dataframes. Thus this operation returns indices that don't correspond to the actual columns defined in trump_states.

Here's a quick reproducible example:

trump_states <- c('NV', 'AZ')
states <- c('GA', 'CA', 'NV', 'AZ', 'FL')
proposals <- data.frame(cbind(c(0.9, 0.3, 0.6), 
                              c(0.6, 0.3, 0.4), 
                              c(0.7, 0.3, 0.6), 
                              c(0.3, 0.2, 0.6),
                              c(0.3, 0.2, 0.6)))

colnames(proposals) <- states
print(proposals)
# >>
#   GA  CA  NV  AZ  FL
# 1 0.9 0.6 0.7 0.3 0.3
# 2 0.3 0.3 0.3 0.2 0.2
# 3 0.6 0.4 0.6 0.6 0.6

# Check indices returned from which fx
idx <- which(proposals[, trump_states] > 0.5)
print(idx)
# >> 1 3 6

# Same as in update_prob.R. Should change 3 items in NV and AZ cols to NA
proposals[which(proposals[, trump_states] > .5)] <- NA

# Show that instead columns 1, 3, 6 are assigned NA. NOT trump_states.
print(proposals)
#   GA  CA NV  AZ  FL V6
# 1 NA 0.6 NA 0.3 0.3 NA
# 2 NA 0.3 NA 0.2 0.2 NA
# 3 NA 0.4 NA 0.6 0.6 NA
pkremp commented 4 years ago

Good catch! Not sure what I was thinking... this is definitely a bug.

On Thu, Nov 5, 2020 at 10:49 PM Saeran Vasanthakumar < notifications@github.com> wrote:

@pkremp https://github.com/pkremp

I am a R newbie, so apologies in advance if this is not really an error. But I believe this particular operation, which is attempting to assign NA to all trump_states that are greater than 0.5 is not working as intended.

https://github.com/pkremp/polls/blob/3c7b9287cd83092ddb2d90f65006d853e626524d/update_prob.R#L47

Specifically, the R which operation returns a flat list of indices corresponding to the row of each column, but not the column indices, when working with dataframes. Thus this operation returns indices that don't correspond to the actual columns defined in trump_states.

Here's a quick reproducible example:

trump_states <- c('NV', 'AZ') states <- c('GA', 'CA', 'NV', 'AZ', 'FL') proposals <- data.frame(cbind(c(0.9, 0.3, 0.6), c(0.6, 0.3, 0.4), c(0.7, 0.3, 0.6), c(0.3, 0.2, 0.6), c(0.3, 0.2, 0.6)))

colnames(proposals) <- states print(proposals)

>>

GA CA NV AZ FL

1 0.9 0.6 0.7 0.3 0.3

2 0.3 0.3 0.3 0.2 0.2

3 0.6 0.4 0.6 0.6 0.6

Check indices returned from which fx

idx <- which(proposals[, trump_states] > 0.5) print(idx)

>> 1 3 6

Same as in prob.R. Should change 3 items in NV and AZ cols to NA

proposals[which(proposals[, trump_states] > .5)] <- NA

Show that instead columns 1, 3, 6 are assigned NA. NOT trump_states.

print(proposals)

GA CA NV AZ FL V6

1 NA 0.6 NA 0.3 0.3 NA

2 NA 0.3 NA 0.2 0.2 NA

3 NA 0.4 NA 0.6 0.6 NA

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pkremp/polls/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJ7BC6OFMO3WKIABBESNODSOOL57ANCNFSM4TMJVFUQ .

saeranv commented 4 years ago

Thanks! Mapping sliced dataframe indices back into an unsliced dataframe is always tricky. For what it's worth here's how I solved it in Python:

biden_idx = [proposals.columns.get_loc(n) for n in biden_states]
row_idx, col_idx = np.where(proposals.iloc[:, biden_idx] < 0.5)
proposals.iloc[row_idx, biden_idx] = np.nan

But I'm sure you'll have a better solution in R.

Cheers,

S