posit-dev / positron

Positron, a next-generation data science IDE
Other
2.52k stars 79 forks source link

R / Ark crashes for factor response in `lm()` #5000

Open strengejacke opened 1 day ago

strengejacke commented 1 day ago

System details:

Positron and OS details:

Positron Version: 2024.10.0 (system setup) build 14 Code - OSS Version: 1.93.0 Commit: 901ab5d11d694212d32e53b97f771c5d601e428e Date: 2024-10-09T15:24:43.348Z Electron: 30.4.0 Chromium: 124.0.6367.243 Node.js: 20.15.1 V8: 12.4.254.20-electron.0 OS: Windows_NT x64 10.0.22631

Interpreter details:

R 4.4.1

Describe the issue:

Running a linear model with a factor as dependent variable crashes R / Ark.

Steps to reproduce the issue:

m <- lm(Species ~ Sepal.Width, data = iris)

I agree the model is nonsense, but still, the session should probably not crash?

jonvanausdeln commented 1 day ago

@strengejacke , thank you for the report! I am able to repro the crash, and I agree it shouldn't crash. I'll pass it along to the triage process.

jennybc commented 1 day ago

To add a bit more details, here's how it looks on the way down:

> m <- lm(Species ~ Sepal.Width, data = iris)
Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : '-' not meaningful for factors
R 4.4.1 exited unexpectedly: killed with signal 11 (SIGSEGV)
DavisVaughan commented 1 day ago

lm() does something stupid and creates a malformed factor. Factors are supposed to be integers under the hood, this one is a double vector containing fractional values (the residuals i think).

> m <- lm(Species ~ Sepal.Width, data = iris)
Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors

> unclass(m)
$coefficients
(Intercept) Sepal.Width
  4.4517480  -0.8019237

$residuals
Error in as.character.factor(x) : malformed factor
> unclass(m)$residuals
Error in as.character.factor(x) : malformed factor

> .Internal(inspect(unclass(m)$residuals))
@13950aa00 14 REALSXP g0c7 [OBJ,REF(17),ATT] (len=150, tl=0) -0.645015,-1.04598,-0.885592,-0.965785,-0.564823,...
ATTRIB:
  @139503cc8 02 LISTSXP g0c0 [REF(1)]
    TAG: @15000f370 01 SYMSXP g0c0 [MARK,REF(65535),LCK,gp=0x4000] "levels" (has value)
    @1288258d8 16 STRSXP g0c3 [REF(65535)] (len=3, tl=0)
      @128827dd0 09 CHARSXP g0c1 [REF(3),gp=0x60] [ASCII] [cached] "setosa"
      @14835ba08 09 CHARSXP g0c2 [REF(3),gp=0x60] [ASCII] [cached] "versicolor"
      @14835ba88 09 CHARSXP g0c2 [REF(3),gp=0x60] [ASCII] [cached] "virginica"
    TAG: @15000f7d0 01 SYMSXP g0c0 [MARK,REF(13902),LCK,gp=0x4000] "class" (has value)
    @128827d98 16 STRSXP g0c1 [REF(65535)] (len=1, tl=0)
      @15008a080 09 CHARSXP g0c1 [MARK,REF(264),gp=0x61] [ASCII] [cached] "factor"
    TAG: @15000f220 01 SYMSXP g0c0 [MARK,REF(6390),LCK,gp=0x4000] "names" (has value)
    @1394b8728 16 STRSXP g0c0 [REF(3)]   <deferred string conversion>
      @1394b8ca0 13 INTSXP g0c0 [REF(65535)]  1 : 150 (compact)

We blindly call INTEGER_ELT() on factors because, well, that's what they are supposed to be. That causes a crash with this dumb malformed factor. Our factor handling should check the type and throw an error if its not a INTSXP internally.