jmanitz / kangar00

development of kangar00
https://cran.r-project.org/package=kangar00
2 stars 0 forks source link

Are rownames in geno neccessary? / fix read_geno function #7

Closed jmanitz closed 8 years ago

jmanitz commented 9 years ago
sfriedrichs commented 9 years ago

better to include rownames -> read_geno function will be adapted to also read id numbers. same id order in geno and pheno is already checked when creating GWASdata object "does pheno has an indicator for being a case/ control (1/0)" -> yes, in column 'pheno'

hofnerb commented 9 years ago

Currently, rownames are rather non-informative. Is the reason to include rownames only that the matching of geno and pheno can be checked?

At the time of construction (of GWASdata) I think it would be ok to not have rownames but set these internally to 1:nrow(data) for both geno and pheno. If either geno or pheno has rownames, the other should perhaps have rownames as well and if we drop data or reorder data, rownames should be considered to match geno and pheno.

sfriedrichs commented 9 years ago

To check matching of geno and pheno is one reason, yes. However, I would like to give the user the possibility to identify the individuals at any time when using kangaroo. Of course you could keep the order unchanged and save an "individual identification" file listing the original id numbers, but I think this can easily lead to mistakes as the identification list might get lost, not be up to date etc. When we are using IDs we shold keep the original numbers. I am working on the "read_geno" function to enable reading the IDs and plan to include typical file formats for genotypes as allowed inputs.

patriciaburger commented 9 years ago

I'm trying to improve the read_geno function. However, the bigmemory packages is not working on my (Linux) cluster. When I use sessionInfo() the package is obviously installed and loaded. Anyhow if I try to use any function in this package like read.big.matrix(), I will get the error could not find function. Does anyone else have this problem?

sfriedrichs commented 9 years ago

I had the same problem with read.big.matrix(). In the end I used fread() from the data.table package to read the genotypes and converted them into big.matrix format with as.big.matrix(). However, fread() is not the best choice here. It is very fast, but could not read headers (even with header=TRUE) and it added a column of NAs to every dataset I was reading in. That is very strange and I don't feel like we can trust this function...

jmanitz commented 9 years ago

I tested read_geno on Monday and it worked fine. Did you load the package bigmemory, which is not integrated in the imports, because in the long-run, we thought to leave the way people read data open for ANY. I think, I added here also the generics definition, so it might be worth another try.

patriciaburger commented 9 years ago

I tried to use the package bigmemory.srion three different systems: The Linux cluster, my own Mac and (out of desperation) a Windows PC. install.packages("bigmemory.sri") seem to work fine and does not return an error. The same is for library("bigmemory.sri"). Checking with sessionInfo() shows that the package is installed and loaded. However, if i try to use ANY function in this package I will get the error messages could not find function.

jmanitz commented 9 years ago

that's wierd. Since you cannot find any function, did you build locally the package and installed it?

jmanitz@fritz:~/Documents/PostDoc/Project_KernelBoosting$ R CMD build kangar00 jmanitz@fritz:~/Documents/PostDoc/Project_KernelBoosting$ R CMD INSTALL kangar00_0.5.tar.gz

Then in R> require(kangar00)

hofnerb commented 9 years ago

@patriciaburger: Why are you using bigmemory.sri? Can you try

install.packages("bigmemory")
library("bigmemory")
library("kangar00")

in order to use read_geno?

patriciaburger commented 9 years ago

@hofnerb Thanks now it is working fine.