Open saeranv opened 4 years ago
Good catch! Not sure what I was thinking... this is definitely a bug.
On Thu, Nov 5, 2020 at 10:49 PM Saeran Vasanthakumar < notifications@github.com> wrote:
@pkremp https://github.com/pkremp
I am a R newbie, so apologies in advance if this is not really an error. But I believe this particular operation, which is attempting to assign NA to all trump_states that are greater than 0.5 is not working as intended.
https://github.com/pkremp/polls/blob/3c7b9287cd83092ddb2d90f65006d853e626524d/update_prob.R#L47
Specifically, the R which operation returns a flat list of indices corresponding to the row of each column, but not the column indices, when working with dataframes. Thus this operation returns indices that don't correspond to the actual columns defined in trump_states.
Here's a quick reproducible example:
trump_states <- c('NV', 'AZ') states <- c('GA', 'CA', 'NV', 'AZ', 'FL') proposals <- data.frame(cbind(c(0.9, 0.3, 0.6), c(0.6, 0.3, 0.4), c(0.7, 0.3, 0.6), c(0.3, 0.2, 0.6), c(0.3, 0.2, 0.6)))
colnames(proposals) <- states print(proposals)
>>
GA CA NV AZ FL
1 0.9 0.6 0.7 0.3 0.3
2 0.3 0.3 0.3 0.2 0.2
3 0.6 0.4 0.6 0.6 0.6
Check indices returned from which fx
idx <- which(proposals[, trump_states] > 0.5) print(idx)
>> 1 3 6
Same as in prob.R. Should change 3 items in NV and AZ cols to NA
proposals[which(proposals[, trump_states] > .5)] <- NA
Show that instead columns 1, 3, 6 are assigned NA. NOT trump_states.
print(proposals)
GA CA NV AZ FL V6
1 NA 0.6 NA 0.3 0.3 NA
2 NA 0.3 NA 0.2 0.2 NA
3 NA 0.4 NA 0.6 0.6 NA
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pkremp/polls/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJ7BC6OFMO3WKIABBESNODSOOL57ANCNFSM4TMJVFUQ .
Thanks! Mapping sliced dataframe indices back into an unsliced dataframe is always tricky. For what it's worth here's how I solved it in Python:
biden_idx = [proposals.columns.get_loc(n) for n in biden_states]
row_idx, col_idx = np.where(proposals.iloc[:, biden_idx] < 0.5)
proposals.iloc[row_idx, biden_idx] = np.nan
But I'm sure you'll have a better solution in R.
Cheers,
S
@pkremp
I am a R newbie, so apologies in advance if this is not really an error. But I believe this particular operation, which is attempting to assign
NA
to alltrump_states
that are greater than 0.5 is not working as intended.https://github.com/pkremp/polls/blob/3c7b9287cd83092ddb2d90f65006d853e626524d/update_prob.R#L47
Specifically, the R
which
operation returns a flat list of indices corresponding to the row of each column, but not the column indices, when working with dataframes. Thus this operation returns indices that don't correspond to the actual columns defined intrump_states
.Here's a quick reproducible example: