lifebit-ai / simulate

Simulate genomic data on demand
2 stars 0 forks source link

GTCA: determine why number of cases and controls different between input and output #20

Closed mmeier93 closed 4 years ago

mmeier93 commented 4 years ago

Context Although gcta runs and accepts the given number of cases and controls (--gwas_cases and --gwas-controls), the output suggests only a subset of samples are used for making the statistics (see an example below).

Aim: Determine what is happening and what can be changed exactly. Importantly, it is not clear if each chromosome would get the same number of individuals.

*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.2 beta Linux
* (C) 2010-present, Jian Yang, The University of Queensland
* Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
*******************************************************************
Analysis started at 22:14:07 UTC on Wed Nov 11 2020.
Hostname: 8383949de14c

Accepted options:
--bfile chr1-simulated_hapgen-updated
--simu-cc 25 25
--simu-causal-loci chr1-causal.snplist
--out chr1-gwas-statistics

Reading PLINK FAM file from [chr1-simulated_hapgen-updated.fam].
50 individuals to be included from [chr1-simulated_hapgen-updated.fam].
Reading PLINK BIM file from [chr1-simulated_hapgen-updated.bim].
1118 SNPs to be included from [chr1-simulated_hapgen-updated.bim].
Reading PLINK BED file from [chr1-simulated_hapgen-updated.bed] in SNP-major format ...
Genotype data for 50 individuals and 1118 SNPs to be included from [chr1-simulated_hapgen-updated.bed].
Simulation parameters:
Number of simulation replicate(s) = 1 (Default = 1)
Heritability of liability = 0.1 (Default = 0.1)
Disease prevalence = 0.1 (Default = 0.1)
Number of cases = 25
Number of controls = 25

Reading a list of SNPs (as causal variants) from [chr1-causal.snplist].
10 SNPs (as causal variants) to be included from [chr1-causal.snplist].
10 unspecified QTL effects are generated from standard normal distribution.
Calculating allele frequencies ...
Recoding genotypes (individual major mode) ...
Simulated QTL effect(s) have been saved in [chr1-gwas-statistics.par].
Simulating GWAS based on the real genotyped data with 1 replicate(s) ...
Simulated 5 cases and 25 controls have been saved in [chr1-gwas-statistics.phen].

Analysis finished at 22:14:07 UTC on Wed Nov 11 2020
Overall computational time: 0.00 sec.
mmeier93 commented 4 years ago

Addressed in latest PR (25/11/20): https://github.com/lifebit-ai/simulate/pull/30