Closed ppjeep closed 7 years ago
Thanks for these very helpful feedbacks.
mds4 (one of the covariates) was reported twice with DIFFERENT results while one of the variables (age) is missing. I am wondering if the model has bugs (not only the order of outputs). Thanks!
I see the problem now. But I cannot replicate this problem - I tried to create an example with 6 covariates as you did and the result file looks fine (no duplicated covariates). Is it possible you can provide a minimal example? Thanks.
Please also let me know the version you used, if this problem persists. Thanks.
I am using the latest version (20170418). I think you can replicate this problem when you use binary trait with a lot of (e.g.,>50%) missing value (-9). I can also send you an example if you send me your email address. Here is my email: zhanght99@yahoo.com
BTW, it would be very helpful if rvrest use --missing option to specify different types of missing values. Thanks!
I agree that --missing-phenotype
can be helpful. RVTESTS current takes -9 and 0 as missing in the binary trait mode. Allow users to specify other missing values can be an added feature.
This latest version should solve the problem in $prefix.SingleWald.assoc files: http://zhanxw.com/rvtests/experimental/rvtests-20170613-01e018-linux64-static.tar.gz
Dear Dr. Zhan, Thanks for providing us such an excellent program, rvtests. 1) I used 6 covariates (age,sex,mds1,mds2,mds3,mds4) in wald single test in rvtests. It seems rvtest reports covariates in the weired order in the output file. For example, age is missing and mds4 was reported twice.
command
rvtest --inVcf ${VCF} --out ${OUT}.single --single wald --numThread 8 --pheno ${phenF} --pheno-name zud5 --covar ${covF} --covar-name age,sex,mds1,mds2,mds3,mds4
output
CHROM POS REF ALT N_INFORMATIVE Test Beta SE Pvalue 1 762485 C A 3018 1:762485 0.000704894 0.0564456 0.990036 1 762485 C A 3018 sex -0.0130407 0.00244367 9.47384e-08 1 762485 C A 3018 mds1 -0.809614 0.0776035 1.75835e-25 1 762485 C A 3018 mds2 -8.40048 2.10399 6.53407e-05 1 762485 C A 3018 mds4 4.49323 11.3983 0.693432 1 762485 C A 3018 mds3 2.85347 13.3588 0.830857 1 762485 C A 3018 mds4 9.50253 13.3434 0.47637
2) Could you please clarify the missing value in phenotype/covariate files in rvtest? In the document of "Single variant tests", it states that "for binary trait, the recommended way of coding is to code controls as 1, cases as 2, missing phenotypes as -9 or 0". In the description of phenotype file, it states that "In phenotype file, missing values can be denoted by NA or any non-numeric values". In covariate file part, it states that "Missing data in the covariate file can be labeled by any non-numeric value (e.g. NA)". Does "NA" always indicate missing in any phen/cov file? Is there a different definition of missing in binary/quantitative traits?
3) It would be very helpful if rvtests could generate Manhatten plot and QQ plot by default. Thank you very much!