hadley / r4ds

R for data science: a book
http://r4ds.hadley.nz
Other
4.6k stars 4.22k forks source link

Ordered factors #1122

Closed elinw closed 2 years ago

elinw commented 2 years ago

This line https://github.com/hadley/r4ds/blob/main/factors.qmd#L416 says that ordered factors " equal distance between levels" but that is not the case, if they were equidistant that would be interval and could be represented by a number. What makes them distinct is that while ordered there is no underlying common unit of measurement. E.g. Strongly agree could be a lot farther from Agree than Agree is from neutral.

Also it has two periods at the end.

hadley commented 2 years ago

Can you provide some evidence for that claim? Ordered factors use polygonal contrasts, which if I understand correctly, imply an equal spacing of the levels.

XiangyunHuang commented 2 years ago

Can you provide some evidence for that claim?

Take likert and Likert scale for an example.

hadley commented 2 years ago

@XiangyunHuang how is that related to the ordered factors class in R?

hadley commented 2 years ago

I'm happy to continue this discussion, but I'm going to close this issue for now since it's not clear to me that anything needs to change in R4DS.

XiangyunHuang commented 2 years ago

@hadley

  1. ?factor gives in Details section

    Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.

    This implies lots of different in statistical models, linear model is just a small example.

  2. R4DS gives in 16.6 Ordered factors section

    Ordered factors, created with ordered(), imply a strict ordering and equal distance between levels

    In fact, Ordered factors doesn't imply equal distance between levels. Therefore, linear models use “polygonal contrasts” and other methods to handle Ordered factors. In survey research, Likert scale is widely used to scaling responses which collects ordered category data. More details about how to processing them, please see Agresti, Alan. 2013. Categorical Data Analysis. 3rd . Hoboken, New Jersey: John Wiley & Sons, Inc. (highly recommended)

leorjorge commented 2 months ago

@hadley I Would like to chip in here as well, even if this is a closed issue. Ordered factors are most definitely not equal distance between levels, otherwise they'd behave exactly like a numeric linear variable. Their application, and all the methods developed to deal with ordered factors both as response and as predictor are based on the idea they have an order but the distances between levels can vary (potentially arbitrarily).

Most methods of ordinal regression have an assumption of a monotonic effect, but even that is not imposed by the polynomial contrasts... Fitting a simple model on an ordered or unordered factor will yield the exact same predictions, they are just coded differently. R uses orthogonal Polynomial (not polygonal) contrasts to ordered factors. Given that is equivalent to fitting a polynomial with 1 - number of factor levels degree to a linear variable, you can get any relationship between the levels: you have the same number of degrees of freedom as the standard contr.treatment contrasts, and can even get a fit of non-monotonic effect (as you'd get from a high order polynomial with a numeric variable).

hadley commented 2 months ago

I don't think it's very productive to argue about this in the abstract. Could you instead propose some alternative wording in the form of a PR?

leorjorge commented 2 months ago

That was my first contribution here, so I hope it was according to the standards. Please feel free to adjust the wording to fit the overall voice in the book