limix / struct-lmm

Structured Linear Mixed Model is a method to test for loci that interact with multiple environments.
Other
14 stars 10 forks source link

hello, I have some question about Struct LMM #28

Open dkssud24 opened 5 years ago

dkssud24 commented 5 years ago

alldata.xlsx bedbimfam.zip cov_agesex.txt env_bmigroup.txt pheno_glucose.txt

hi @horta, I am a Master degree in Kyung Hee University in Korea. I am majoring in bio informatics. I was impressed with your paper and struct LMM program. I think the idea about Struct LMM is the best. So I tried to assign it to part of our data to your program, and then I want to make sure it worked properly.

  1. Why do I get an error if I do not norm or gausinaize Pheno data?
  2. Is the outcome of the assignment of Pheno, cov, and env equal to your intentions?

outcome : chrom snp cm pos a0 a1 i pv_int pv 1 rs17106184 0.0 50909985 A G 0 0.116081 0.039977

`import os import pandas as pd import scipy as sp from limix_core.util.preprocess import gaussianize from struct_lmm import run_structlmm from struct_lmm.utils.sugar_utils import norm_env_matrix from pandas_plink import read_plink import geno_sugar as gs

if name == "main":

# import genotype file
#bedfile = "data_structlmm/chrom22_subsample20_maf0.10"
bedfile = "rs17106184_N50"
(bim, fam, G) = read_plink(bedfile)

# subsample snps
#Isnp = gs.is_in(bim, ("22", 17500000, 18000000))
#G, bim = gs.snp_query(G, bim, Isnp)

# load phenotype file
phenofile = "pheno_glucose.txt"
dfp = sp.loadtxt(phenofile)
pheno = norm_env_matrix(dfp)
#dfp = pd.read_csv(phenofile, index_col=0)
#pheno = gaussianize(dfp.loc["BMI"].values[:, None])

# load environment file and normalize
envfile = "env_bmigroup.txt"
E = sp.loadtxt(envfile)
E = norm_env_matrix(E)

# mean as fixed effect
#covs = sp.ones((E.shape[0], 1))
covs = "cov_agesex.txt"
covs = sp.loadtxt(covs)

# run analysis with struct lmm
snp_preproc = {"max_miss": 0.01, "min_maf": 0.02}
res = run_structlmm(
    G, bim, pheno, E, covs=covs, batch_size=100, snp_preproc=snp_preproc
)

# export
print("Export")
print(res)
#if not os.path.exists("out"):
#    os.makedirs("out")
#res.to_csv("out/res_structlmm.csv", index=False)

`

horta commented 4 years ago

Hi @dkssud24 . Have you tried the new version?

Regarding question 1., struct-lmm might fail to run if your phenotype have extreme values (is that the case?). Gaussianizing the phenotype make those values become smaller.

Regarding question 2., we have updated the documentation. It is more clear now how to use the inputs. Please, have a look at https://github.com/limix/struct-lmm/blob/master/struct_lmm/_lmm.py