thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
166 stars 64 forks source link

Does read.genepop really needs `ncode`? #326

Open courtiol opened 2 years ago

courtiol commented 2 years ago

Hi all,

I am not fully familiar with the genepop format and its possible variants... But I am wondering if it is really necessary to ask the user to input ncode. Since the function only handles diploid case, then it should be straightforward to determine ncode automatically.

For example, just before https://github.com/thibautjombart/adegenet/blob/b8b7b3b1cf3081dc43f85b8d6794f89f161fa084/R/import.R#L707-L711

we could thus have something like this:

if (is.null(ncode)) ncode <- nchar(strsplit(vec.genot[1], split = " ")[[1]][1])/2
NA.char <- paste(rep("0", ncode), collapse = "")

Or if things are supposed to be heterogeneous (I don't think so but could be, no idea) we could have something like:

if (is.null(ncode)) { 
      ncodes <- unique(unlist(lapply(sapply(vec.genot, strsplit, split = " "), nchar)))
      if (length(ncodes) > 1) stop("ncode must be defined for this dataset")
      ncode <- ncodes/2
    }
NA.char <- paste(rep("0", ncode), collapse = "")

I used if (is.null(ncode)) for backward compatibility, but we could also drop the argument altogether.

If that sounds useful, I am happy to write a PR which would include tests and documentation too, but I would need information or some datasets other than nancycats.gen in case there would be variation in genepop formats beyond a variable number of digit.

I would also need to know whether dropping the argument altogether in an option. I could check if the user defines it and return a message saying it is now ignored.

++