magnusdv / pedtools

Tools for working with pedigrees in R
GNU General Public License v3.0
23 stars 3 forks source link

Problem with differing data types of ID-vecors in `ped()` #52

Closed deepchocolate closed 5 months ago

deepchocolate commented 6 months ago

Encountered a rather strange error that made me scratch my head: Error: fid entry does not appear in id vector: 123 When I checked this number was available in the id vector. I think the problem origins in these lines: https://github.com/magnusdv/pedtools/blob/e6b1a30aef4ea42c53adb01f424e62319b90fcaa/R/ped.R#L112-L115

It's easy for the user to fix such errors, but the error message makes you look in the wrong place. Perhaps you could do a type check of the ID columns, eg typeof(id) == typeof(fid)?

Illustration

library(pedtools)

# Works
dt <- data.frame(id=c(1,2,3), f=c(2, 0, 0), m=c(3,0,0), sex=c(1,1,2))
dt$id <- as.integer(dt$id)
with(dt, ped(id=id, fid=f, mid=m, sex=sex))

# Error
dt <- data.frame(id=c(1,2000000,3), f=c(2000000, 0, 0), m=c(3,0,0), sex=c(1,1,2))
dt$id <- as.integer(dt$id)
with(dt, ped(id=id, fid=f, mid=m, sex=sex))

Great package btw!

magnusdv commented 6 months ago

Hi, many thanks for a clear description.

As you already figured out, the problem occurs because id, fid and mid are converted to characters, during which the type matters:

as.character(as.integer(1000000))
#> [1] "1000000"
as.character(as.numeric(1000000))
#> [1] "1e+06"

I agree that the error message you got is a tad confusing in your case, but I'm leaning towards leaving it as is for now. In the vast majority of cases, this error actually points to missing individuals. (I'll keep it on my watch list, though.)

Anyway, I have brushed up the documentation of ped(), specifically mentioning the type conversion.