statgen / ruth

Robust Unified Hardy-Weinberg Equilibrium Test
Apache License 2.0
6 stars 2 forks source link

Fatal error with sm-list #1

Open jjfarrell opened 5 years ago

jjfarrell commented 5 years ago

The program is being run with a sm-list whose samples are all in the evac table (4789 samples). The VCF has slightly more samples 4815 and generates a FATAL error. I was expecting the subset to occur first and then a check for that the VCF samples have PCs. This is easy enough to get around by subsetting with bcftools view -S gcad_samples_4789.txt but it would be nice to be able to select the subset with ruth.

ruth --vcf adsp5k.manta.indels.norm.vcf.gz --evec adsp5k.evec  --field  PL --out adsp5k.manta.indels.norm.ruth.4789.vcf.gz --sm-list gcad_samples_4789.txt

Available Options

The following parameters are available. Ones with "[]" are in effect:
                              Input Options : --evec [adsp5k.evec],
                                              --vcf [adsp5k.manta.indels.norm.vcf.gz],
                                              --thin [1.00], --seed,
                                              --num-pc [4], --field [PL],
                                              --gt-error [5.0e-03],
                                              --lambda [1.00]
                             Output Options : --out [adsp5k.manta.indels.norm.ruth.4789.vcf.gz],
                                              --skip-if, --skip-info,
                                              --site-only, --nelder-mead,
                                              --lrt-test, --lrt-em
                        Samples to focus on : --sm-list [gcad_samples_4789.txt]
             Parameters for sex chromosomes : --sex-map, --x-label [X],
                                              --y-label [Y], --mt-label [MT],
                                              --x-start [2699520],
                                              --x-stop [154931044]
   Options to specify when chunking is used : --ref, --unit [2147483647],
                                              --interval, --region

Run with --help for more detailed help messages of each argument.

NOTICE [2019/11/05 21:34:32] - Analysis Started
NOTICE [2019/11/05 21:34:32] - Reading sample eigenvectors
NOTICE [2019/11/05 21:34:32] - Identifying sample columns to extract..
NOTICE [2019/11/05 21:34:32] - Reading in BCFs...
NOTICE [2019/11/05 21:34:32] - Finished identifying 4789 samples to load from VCF/BCF

FATAL ERROR -
[E:/share/pkg.7/ruth/git778d784/src/ruth/frequency_estimator.cpp:121 bool frequency_estimator::set_variant(bcf1_t*, int8_t*, int32_t*)] nsamples 4815 != 4789 in the EigenVector
hyunminkang commented 5 years ago

This is a known bug. We will correct it in the next major update.