renozao / FAQR

Frequently Asked Questions on R: my personal Ask Just Once system for my friends' R problems...
0 stars 0 forks source link

Partial matching when subsetting a dataframe #15

Open Rachelly opened 8 years ago

Rachelly commented 8 years ago

This really struck me.. When sub-setting a data.frame using the [] operator, partial matched names are picked! Is this how it's supposed to work?? I vaguely remember that I discussed this with Renaud once, but don't remember the conclusion we got to.. Thanks! Rachelly.

x=data.frame("A"=c(1,2,3,4), "B"=c(1,2,3,4), "C"=c(1,2,3,4), row.names = c("123","345","1201","22"))

x A B C 123 1 1 1 345 2 2 2 1201 3 3 3 22 4 4 4

x["123",] A B C 123 1 1 1

x["120",] A B C 1201 3 3 3

x["12",] # A row is not found, because there are 2 possible matches A B C NA NA NA NA

x[rownames(x) == "120",] # At least this works!! [1] A B C

<0 rows> (or 0-length row.names)
renozao commented 8 years ago

Yes, this how it is meant to work, although I agree it is a dangerous default behaviour. Check the documentation for data.frame, which recommends using match when exact matching is desired. But you have to be careful with indexes that are in the query and not in the data.frame: look at what x[NA, ] returns:

x[NA, ]
> x[NA, ]
      A  B  C
NA   NA NA NA
NA.1 NA NA NA
NA.2 NA NA NA
NA.3 NA NA NA

No comment on the disastrous potential side effect of this.

Options:

x=data.frame("A"=c(1,2,3,4), "B"=c(1,2,3,4), "C"=c(1,2,3,4), row.names = c("123","345","1201","22"))
index <- c('123', '120', '12')
subset_row <- function(x, index){
   res <- x[match(index, rownames(x), nomatch = nrow(x) + 1), ]
   rownames(res) <- index
   res
}
subset_row(x, index)