harrelfe / Hmisc

Harrell Miscellaneous
Other
205 stars 81 forks source link

Describe Function Error Associated with Factors #104

Closed xxmissingnoxx closed 3 years ago

xxmissingnoxx commented 5 years ago

Problem: It looks as though it's possible to crash the describe function by using using a character column which includes specific values. This originated from analyzing the dataset here: http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat

How do I reproduce?: library(Hmisc);x=data.frame(crash=rep(c(".","1","2","3","4","5"),2),y=1:12); describe(x)

This yields the error:

Error in if (print.ext) { : missing value where TRUE/FALSE needed In addition: Warning message: In all.is.numeric(names(weights), "vector") : NAs introduced by coercion

What's the cause?: The problem https://github.com/harrelfe/Hmisc/blob/44f15e5587d0d88acbfd8969ab3e47706e1e24be/R/describe.s#L386 seems to originate from an attempt to format the output of the describe object (386 and 388 of describe.s). We try to see whether frequency will fit on the screen via a heuristic (200 and 20 char window), but we can't calculate how wide our output might be because one of our levels was converted to NA in the all.is.numeric function called by wtd.table , the nchar function yields an NA value, and summing over a vector with NA will yield NA without the na.rm flag.

Possible Solution Given that this is really about guessing whether text will fit on screen in a reasonable manner, there seem to be several options on the table:

I really like your book (only checked out the first edition and will have to look at the second). I'm not sure if this is enough information or follows proper etiquette but I think one of the solutions above will work and should be an easy fix.

chamaoskurumi commented 3 years ago

Problem still persists, I had it, too today.

couthcommander commented 3 years ago

I agree with xxmissingnoxx's assessment, and have created a pull request (using the second approach). Will give FEH final say.