Function proposals[which(proposals[, trump_states] > .5)] does not assign NA to correct indices.

@pkremp

I am a R newbie, so apologies in advance if this is not really an error. But I believe this particular operation, which is attempting to assign NA to all trump_states that are greater than 0.5 is not working as intended.

https://github.com/pkremp/polls/blob/3c7b9287cd83092ddb2d90f65006d853e626524d/update_prob.R#L47

Specifically, the R which operation returns a flat list of indices corresponding to the row of each column, but not the column indices, when working with dataframes. Thus this operation returns indices that don't correspond to the actual columns defined in trump_states.

Here's a quick reproducible example:

trump_states <- c('NV', 'AZ')
states <- c('GA', 'CA', 'NV', 'AZ', 'FL')
proposals <- data.frame(cbind(c(0.9, 0.3, 0.6), 
                              c(0.6, 0.3, 0.4), 
                              c(0.7, 0.3, 0.6), 
                              c(0.3, 0.2, 0.6),
                              c(0.3, 0.2, 0.6)))

colnames(proposals) <- states
print(proposals)
# >>
#   GA  CA  NV  AZ  FL
# 1 0.9 0.6 0.7 0.3 0.3
# 2 0.3 0.3 0.3 0.2 0.2
# 3 0.6 0.4 0.6 0.6 0.6

# Check indices returned from which fx
idx <- which(proposals[, trump_states] > 0.5)
print(idx)
# >> 1 3 6

# Same as in update_prob.R. Should change 3 items in NV and AZ cols to NA
proposals[which(proposals[, trump_states] > .5)] <- NA

# Show that instead columns 1, 3, 6 are assigned NA. NOT trump_states.
print(proposals)
#   GA  CA NV  AZ  FL V6
# 1 NA 0.6 NA 0.3 0.3 NA
# 2 NA 0.3 NA 0.2 0.2 NA
# 3 NA 0.4 NA 0.6 0.6 NA

Good catch! Not sure what I was thinking... this is definitely a bug.

On Thu, Nov 5, 2020 at 10:49 PM Saeran Vasanthakumar < notifications@github.com> wrote:

@pkremp https://github.com/pkremp

I am a R newbie, so apologies in advance if this is not really an error. But I believe this particular operation, which is attempting to assign NA to all trump_states that are greater than 0.5 is not working as intended.

https://github.com/pkremp/polls/blob/3c7b9287cd83092ddb2d90f65006d853e626524d/update_prob.R#L47

Specifically, the R which operation returns a flat list of indices corresponding to the row of each column, but not the column indices, when working with dataframes. Thus this operation returns indices that don't correspond to the actual columns defined in trump_states.

Here's a quick reproducible example:

trump_states <- c('NV', 'AZ') states <- c('GA', 'CA', 'NV', 'AZ', 'FL') proposals <- data.frame(cbind(c(0.9, 0.3, 0.6), c(0.6, 0.3, 0.4), c(0.7, 0.3, 0.6), c(0.3, 0.2, 0.6), c(0.3, 0.2, 0.6)))

colnames(proposals) <- states print(proposals)

>>

GA CA NV AZ FL

1 0.9 0.6 0.7 0.3 0.3

2 0.3 0.3 0.3 0.2 0.2

3 0.6 0.4 0.6 0.6 0.6

Check indices returned from which fx

idx <- which(proposals[, trump_states] > 0.5) print(idx)

>> 1 3 6

Same as in prob.R. Should change 3 items in NV and AZ cols to NA

proposals[which(proposals[, trump_states] > .5)] <- NA

Show that instead columns 1, 3, 6 are assigned NA. NOT trump_states.

print(proposals)

GA CA NV AZ FL V6

1 NA 0.6 NA 0.3 0.3 NA

2 NA 0.3 NA 0.2 0.2 NA

3 NA 0.4 NA 0.6 0.6 NA

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pkremp/polls/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJ7BC6OFMO3WKIABBESNODSOOL57ANCNFSM4TMJVFUQ .

Thanks! Mapping sliced dataframe indices back into an unsliced dataframe is always tricky. For what it's worth here's how I solved it in Python:

biden_idx = [proposals.columns.get_loc(n) for n in biden_states]
row_idx, col_idx = np.where(proposals.iloc[:, biden_idx] < 0.5)
proposals.iloc[row_idx, biden_idx] = np.nan

But I'm sure you'll have a better solution in R.

Cheers,

pkremp / polls

Function proposals[which(proposals[, trump_states] > .5)] does not assign NA to correct indices. #13

>>

GA CA NV AZ FL

1 0.9 0.6 0.7 0.3 0.3

2 0.3 0.3 0.3 0.2 0.2

3 0.6 0.4 0.6 0.6 0.6

Check indices returned from which fx

>> 1 3 6

Same as in prob.R. Should change 3 items in NV and AZ cols to NA

Show that instead columns 1, 3, 6 are assigned NA. NOT trump_states.

GA CA NV AZ FL V6

1 NA 0.6 NA 0.3 0.3 NA

2 NA 0.3 NA 0.2 0.2 NA

3 NA 0.4 NA 0.6 0.6 NA