vatlab / varianttools

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis
https://vatlab.github.io/vat-docs/
GNU General Public License v3.0
31 stars 4 forks source link

KING pipeline does not work #147

Open BoPeng opened 4 years ago

BoPeng commented 4 years ago

141

I updated my vtools to v3.1.2, python 3.7.6, King 2.2.4

As I see in this disussion, @gaow commented that KING is updated. But I'm getting the error exactly as mentioned in that ISSUE.

I checked the changes in KING made by @gaow through this link - it shows .txt output in step KING_41 as opposed to .ped in an older version. But my error still shows, .ped file does not exist. I have updated my software using bioconda, so I am not sure whether it has been updated there or not. Can that be causing this error? Meanwhile, the issue with EXPORT command remain the same in my latest version. Could you please help with it? (I tried exporting to .tped as well without any luck)

BoPeng commented 4 years ago

I have trouble installing plink because the bioconda plink uses a version of gsl that conflicts with the conda-forge version of gsl used by vtools.

BoPeng commented 4 years ago
vtools execute KING   --jobname dummy  --var_table pass_variants   --king_path ~/bin/   --plink_path ~/bin

INFO: Executing KING.king_0: Load specified snapshot if a snapshot is specified. Otherwise use the existing project.
INFO: Executing KING.king_10: Check the existence of KING and PLINK command.
INFO: Command /Users/bpeng/bin//king is located.
INFO: Command /Users/bpeng/bin/plink is located.
INFO: Executing KING.king_20: Write selected variant and samples in tped format
INFO: Running vtools export pass_variants --format tped --samples "1" | awk '{$2=$1"_"$4;$3=0;print $0}' > /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache/dummy.tped
INFO: Executing KING.king_21: Rename tfam file to match tped file
INFO: Running mv pass_variants.tfam /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache/dummy.tfam
INFO: Executing KING.king_30: Calculate LD pruning candidate list with a cutoff of R^2=0.5
INFO: Running /Users/bpeng/bin/plink --tped dummy.tped --tfam dummy.tfam --indep-pairwise 50 5 0.5 --allow-no-sex --out dummy.LD.50 under /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache
INFO: Executing KING.king_31: LD pruning from pre-calculated list
INFO: Running /Users/bpeng/bin/plink --tped dummy.tped --tfam dummy.tfam --extract dummy.LD.50.prune.in --no-parents --no-sex --no-pheno --maf 0.01 --make-bed --out dummy under /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache
INFO: Executing KING.king_41: Global ancestry inference
INFO: Running /Users/bpeng/bin//king -b dummy.bed --mds --prefix dummy- under /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache
ERROR: Failed to execute step king_41: Output file /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache/dummy-pc.txt does not exist after completion of the job.
king
KING 2.2.4 - (c) 2010-2019 Wei-Min Chen
 plink

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

So the problem seems to be with the version of king.

BoPeng commented 4 years ago

Error message is

Genotypes stored in 1 words for each of 26 individuals.
The number of individuals is < 1.

so the previous command plink --make-bed could be doing something wrong.

I would suggest that @enigmargs tries to understand what ~/.variant_tools/pipeline/KING.pipeline is trying to achieve and see if this is what is supposed to happen given this particular dataset.

enigmargs commented 4 years ago

I tried exececuting steps individually on,

Is there any specific reason to use tfile format in export (instead of vcf) at _King20? Is it advisable to calculate PCs using PLINK and then import as phenotype field directly?

I sincerely hope that I'm not dragging it too long!

gaow commented 4 years ago

@enigmargs This pipeline not only does PCA but also does relationship analysis which is important in GWAS. Compared to PLINK, the relationship analysis using KING is more robust to the presence of population structure, and can perform pair-wise comparisons between individuals thus works for small sample size where good estimate of allele frequency is challenging.

Perhaps for PCA/MDS analysis there is no major difference between these tools. Unfortunately we don't have a separate implementation for that.

It seems the failure is related to LAPACK installation?