Closed thoree closed 4 years ago
Thanks for reporting this - it revealed a couple of intricate bugs in the handling of marker names when converting from data.frame
into ped
objects.
Your example works fine now, so I'm closing this issue. But there may be other surprises here, more testing is needed.
Great new functionality!
Previous example is now fine.
With one column for each allele,
my understanding is that what goes before
"." becomes the marker name. Fine!
readPed
seems to accept identical marker names sometimes, but sometimes
make sensible change:
for sep = "/" marker names are changed to obtain difference:
setwd("C:\\git\\exclusion\\magnus")
library(pedtools)
x = readPed("pedfile2.ped")
x
#> id fid mid sex M1 M1
#> 1 * * 1 1/1 2/3
#> 2 * * 2 -/- -/-
#> 3 1 2 2 -/- -/-
x = readPed("pedfile3.ped", sep = "/")
x
#> id fid mid sex M1 M1.1
#> 1 * * 1 1/1 2/3
#> 2 * * 2 -/- -/-
#> 3 1 2 2 -/- -/-
Created on 2019-08-10 by the reprex package (v0.3.0)
Hmm, perhaps we should be stricter about this.
When marker names are included in the pedfile, I cannot imagine a scenario where duplicated names would be OK. I suggest to give an error in this case.
Agree?
Agree!
Here's an Argentina example that works nicely. I had to change marker names 'Penta D' and 'Penta _E' to 'Penta.D' and 'Penta.E' in the uploaded files. Below data is read and a mutation model introduced
setwd("C:\\git\\exclusion\\magnus")
library(pedprobr,quietly = TRUE)
x = readPed("x.ped", sep="/")
x = setFrequencyDatabase(x, database = "db.txt")
likelihood(x,2) # = 0 because of likely mutation
#> [1] 0
x = setLocusAttributes(x, markers = 2, loc = list(mutmod = "proportional", rate = 0.05))
likelihood(x,2)
#> [1] 1.298405e-07
Created on 2019-08-10 by the reprex package (v0.3.0)
Ok, I've done some changes now, and have a few clarifications.
1. Marker names when alleles are in separate columns The rule is as follows: The names of odd-numbered columns are used, and everything from the last period is deleted.
Example:
library(pedtools)
df = cbind(as.data.frame(singleton(1)), Penta.D.1 = 1, Penta.D.2 = 2)
df
#> id fid mid sex Penta.D.1 Penta.D.2
#> 1 1 0 0 1 1 2
as.ped(df)
#> id fid mid sex Penta.D
#> 1 * * 1 1/2
2. "Fixing" marker names in readPed()
Nothing should ever be changed with the input, For example, the space in "Penta E" should remain (although such names should of course be avoided in general).
I have now added check.names = F
in the call to read.table
to ensure this.
3. Duplicated marker names
This should give an error, and is now checked for in readPed()
:
library(pedtools)
readPed("https://raw.githubusercontent.com/thoree/exclusion/master/magnus/pedfile3.ped", sep = "/")
#> Error: Duplicated marker name(s): M1
I did not manage to get the variable names right (below files are in https://github.com/thoree/exclusion/tree/master/magnus and I used reprex_addin() for the below):
library(pedtools) x = readPed("pedfile.ped", sep ="/") x = setFrequencyDatabase(x, database = "dbfile.txt")
> M1 M2
> 1 0.5 NA
> 2 0.5 0.8
> 3 NA 0.2
> Error: Unknown marker name: M1, M2
Created on 2019-08-09 by the reprex package (v0.3.0)