Closed jmanitz closed 8 years ago
better to include rownames -> read_geno function will be adapted to also read id numbers. same id order in geno and pheno is already checked when creating GWASdata object "does pheno has an indicator for being a case/ control (1/0)" -> yes, in column 'pheno'
Currently, rownames are rather non-informative. Is the reason to include rownames only that the matching of geno
and pheno
can be checked?
At the time of construction (of GWASdata
) I think it would be ok to not have rownames but set these internally to 1:nrow(data)
for both geno
and pheno
. If either geno
or pheno
has rownames, the other should perhaps have rownames as well and if we drop data or reorder data, rownames should be considered to match geno
and pheno
.
To check matching of geno and pheno is one reason, yes. However, I would like to give the user the possibility to identify the individuals at any time when using kangaroo. Of course you could keep the order unchanged and save an "individual identification" file listing the original id numbers, but I think this can easily lead to mistakes as the identification list might get lost, not be up to date etc. When we are using IDs we shold keep the original numbers. I am working on the "read_geno" function to enable reading the IDs and plan to include typical file formats for genotypes as allowed inputs.
I'm trying to improve the read_geno
function. However, the bigmemory packages is not working on my (Linux) cluster. When I use sessionInfo()
the package is obviously installed and loaded. Anyhow if I try to use any function in this package like read.big.matrix()
, I will get the error could not find function
. Does anyone else have this problem?
I had the same problem with read.big.matrix(). In the end I used fread() from the data.table package to read the genotypes and converted them into big.matrix format with as.big.matrix(). However, fread() is not the best choice here. It is very fast, but could not read headers (even with header=TRUE) and it added a column of NAs to every dataset I was reading in. That is very strange and I don't feel like we can trust this function...
I tested read_geno on Monday and it worked fine. Did you load the package bigmemory, which is not integrated in the imports, because in the long-run, we thought to leave the way people read data open for ANY. I think, I added here also the generics definition, so it might be worth another try.
I tried to use the package bigmemory.sri
on three different systems: The Linux cluster, my own Mac and (out of desperation) a Windows PC. install.packages("bigmemory.sri")
seem to work fine and does not return an error. The same is for library("bigmemory.sri")
. Checking with sessionInfo()
shows that the package is installed and loaded. However, if i try to use ANY function in this package I will get the error messages could not find function
.
that's wierd. Since you cannot find any function, did you build locally the package and installed it?
jmanitz@fritz:~/Documents/PostDoc/Project_KernelBoosting$ R CMD build kangar00 jmanitz@fritz:~/Documents/PostDoc/Project_KernelBoosting$ R CMD INSTALL kangar00_0.5.tar.gz
Then in R> require(kangar00)
@patriciaburger: Why are you using bigmemory.sri
? Can you try
install.packages("bigmemory")
library("bigmemory")
library("kangar00")
in order to use read_geno
?
@hofnerb Thanks now it is working fine.