zhanxw / rvtests

Rare variant test software for next generation sequencing data
132 stars 42 forks source link

Incorrect phenotype file format [ m2_pheno.pheno ], check column number #143

Closed wilsonava closed 1 year ago

wilsonava commented 1 year ago

I am trying to execute the code pasted below, however, I keep receiving the error message: [ERROR] Incorrect phenotype file format [ m2_pheno.pheno ], check column number [ERROR] Loading phenotype failed! [WARN] Total [ 60322 ] samples are dropped from VCF file due to missing phenotype.

I have pasted the format for my phenotype and covariate files, however, I am unsure how to resolve the error message that I am receiving.


exome_file_dir="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/" data_field="ukb23158" data_file_dir="/Data/M2_burden/" txt_file_dir="/Data/M2_burden/" genelist=" "

for i in {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,19,20,22,X}; do

run_rvtest_wes="wget https://github.com/zhanxw/rvtests/releases/download/v2.1.0/rvtests_linux64.tar.gz; \
  tar zxvf rvtests_linux64.tar.gz;  \
  ./executable/rvtest --inVcf WES_c${i}_qc_pass.vcf.gz --freqUpper 0.05 \
  --pheno m2_pheno.pheno --pheno-name copd_only --out M2_c${i}_rvtest_gs5 \
  --covar m2_covariates.covar --covar-name age,sex_rec,py_rec,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 \
  --geneFile refFlat_c${i}.txt.gz ${genelist} --burden cmc --kernel skat,skato ; \
  rm rvtests_linux64.tar.gz; rm -rf ex*; rm -rf READM* "

dx run swiss-army-knife -iin="${data_file_dir}/WES_c${i}_qc_pass.vcf.gz" \
 -iin="${data_file_dir}/WES_c${i}_qc_pass.vcf.gz.tbi" \
 -iin="${txt_file_dir}/m2_pheno.pheno" \
 -iin="${txt_file_dir}/m2_covariates.covar" \
 -iin="${txt_file_dir}/reflat38/refFlat_c${i}.txt.gz" \
 -icmd="${run_rvtest_wes}" --tag="Step2-rvt" --instance-type "mem1_ssd1_v2_x16"\
 --destination="${project}:/Data/M2_burden/" --brief --yes

done


Pheno file format: fid iid copd_only 1000435 1000435 0 1000894 1000894 0 1001008 1001008 0

Covar file format: fid iid age sex_rec py_rec PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 1000435 1000435 59 0 12 -13.0164 4.6549 0.90175 0.975207 -2.93239 -0.809056 2.34515 1.93398 0.228173 1.37741 1000894 1000894 46 0 24 -12.9397 3.63128 -2.9293 8.73254 21.964 -1.19087 -0.0187751 -0.255879 4.10645 -4.19429 1001008 1001008 51 1 11.625 -11.6659 3.4537 -1.61325 2.08785 -5.55901 -1.99927 -0.141545 -0.609732 1.96575 -0.877785