markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

Support `labelled` class treatment vectors #159

Closed josherrickson closed 5 years ago

josherrickson commented 6 years ago

Tibble objects can contain "labelled" vectors which are created during imports from Stata and SPSS. We fail to support this.

> data <- haven::read_stata('tmp.dta')
> data$match <- fullmatch(foreign ~ mpg, data = data, min = 1, max = 2)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘toZ’ for signature ‘"labelled"’

We stopped supporting factor-type treatment vectors, but I'm not sure whether labelled objects are closer to factors or numerics. Once we decide that, we should either add an informative warning a la factors, or implement a toZ function for labelled.

benthestatistician commented 6 years ago

I would have guessed that

  1. Stata has a type analogous to R's logical
  2. The haven package converts that type to logical, not labelled.

If both of these are correct, then I think we'd want to decline to recognize labelled objects as treatment vectors, rather passing an error suggesting the user convert to logical. The intention of the haven package does seem to be to get people to convert at the earliest possible moment.

The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate datastructure that you can convert into a regular R data frame. You can do this by either converting to a factor or stripping the labels....

(From haven's "Conversion semantics" vignette.) But perhaps I'm mistaken about (1) or (2) above?

josherrickson commented 6 years ago

Stata doesn't have a logical. All it has (basically) are numeric and string, but the numerics can have labels attached to them (they're still numerics, but you can think of them as factors to certain extent).

So this arises not because we're importing something non-logical, but because Stata has a label attached to an otherwise completely normal numeric variable.

I believe the labelled type has the same concept - its a number with a label, as opposed to a factor (I guess the main difference I'm noting here is that in a factor, the "number" can be meaningless. In a labelled, the number if quite meaningful.)

I think the best approach is just let toZ.labelled cast the input to numeric and let toZ.numeric handle checking the input. I can add something in the documentation about it.

benthestatistician commented 6 years ago

I'd be happy to go with Josh's solution of casting labelled numerics as numeric.

On Mon, Nov 5, 2018 at 3:04 PM, Josh Errickson notifications@github.com wrote:

Stata doesn't have a logical. All it has (basically) are numeric and string, but the numerics can have labels attached to them (they're still numerics, but you can think of them as factors to certain extent).

So this arises not because we're importing something non-logical, but because Stata has a label attached to an otherwise completely normal numeric variable.

I believe the labelled type has the same concept - its a number with a label, as opposed to a factor (I guess the main difference I'm noting here is that in a factor, the "number" can be meaningless. In a labelled, the number if quite meaningful.)

I think the best approach is just let toZ.labelled cast the input to numeric and let toZ.numeric handle checking the input. I can add something in the documentation about it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markmfredrickson/optmatch/issues/159#issuecomment-436015966, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyg1gxfXFlQsY-KZw2kLLOjfNqblsLhks5usJmwgaJpZM4YCrCo .

josherrickson commented 5 years ago

This has been implemented (hoping not tempting fate by closing this issue before Travis finishes its final run and the branch gets merged in....)