rwdavies / QUILT

GNU General Public License v3.0
45 stars 10 forks source link

Observing number of reference haplotypes K=0 #32

Closed karstyl closed 2 months ago

karstyl commented 2 months ago

Hello,

I had the program working a week or so ago, but I am running into issues now and I can't find the problem. The issue is "Observing number of reference haplotypes K=0" This causes an exit with overflow, but I am sure the issue is the lack of reference haplotypes.

Here is my input: /home/jjohnso/project-aces/apps/QUILT/QUILT/QUILT.R \ --outputdir=${basefolder}/${quiltoutputfolder}/${dirname} \ --chr=${chr} \ --tempdir=${basefolder}/${quiltoutputfolder}/temp.${dirname} \ --output_filename=quilt.${mapVS}.chr${chr}.f${fam}.vcf \ --bamlist=${basefolder}/${quiltinputfolder}/fam${fam}bams.txt \ --sampleNames_file=${basefolder}/${quiltinputfolder}/fam${fam}.names \ --reference_haplotype_file=${basefolder}/${quiltinputfolder}/chr${chr}.hap.gz \ --reference_legend_file=${basefolder}/${quiltinputfolder}/chr${chr}.legend.gz \ --genetic_map_file=${basefolder}/gpinfo/generic.geneticmap.txt \ --reference_sample_file=${basefolder}/${quiltinputfolder}/chr${chr}.samples \ --downsampleToCov=6 --nGen=${nGenval} \ --make_plots=TRUE --nCores=1 \ --save_prepared_reference=TRUE

input files: zcat quiltinputs/chr1.hap.gz | head | cut -c-10 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0

zcat quiltinputs/chr1.legend.gz | head id position a0 a1 1:17092_T_C 17092 T C 1:17157_C_G 17157 C G 1:17201_C_T 17201 C T 1:17297_G_C 17297 G C 1:17482_G_C 17482 G C 1:17494_C_T 17494 C T 1:17755_C_T 17755 C T 1:17760_T_A 17760 T A 1:17795_G_A 17795 G A

cat quiltinputs/chr1.samples | head sample population group sex F03F111_aggr F03F111_aggr F03F111_aggr 2 F03F340_tame F03F340_tame F03F340_tame 2 F03F348_tame F03F348_tame F03F348_tame 2 F03F352_tame F03F352_tame F03F352_tame 2 F03F357_tame F03F357_tame F03F357_tame 2 F03F412_tame F03F412_tame F03F412_tame 2 F03S072_aggr F03S072_aggr F03S072_aggr 2 F03S091_tame F03S091_tame F03S091_tame 2 F03S136_aggr F03S136_aggr F03S136_aggr 2

Output: [2024-04-09 12:11:29] Running QUILT(outputdir = /home/jjohnso/project-aces/vv31/quiltout/quilt_output_famF102, chr = 1, regionStart = NA, regionEnd = NA, buffer = NA, bamlist = /home/jjohnso/project-aces/vv31/quiltinputs/fam02bams.txt, cramlist = , sampleNames_file = /home/jjohnso/project-aces/vv31/quiltinputs/fam02.names, reference = , nCores = 1, nGibbsSamples = 7, n_seek_its = 3, Ksubset = 400, Knew = 100, K_top_matches = 5, output_gt_phased_genotypes = TRUE, heuristic_match_thin = 0.1, output_filename = quilt.vv31.chr1.f02.vcf, RData_objects_to_save = NULL, output_RData_filename = NULL, prepared_reference_filename = , save_prepared_reference = TRUE, tempdir = /home/jjohnso/project-aces/vv31/quiltout/temp.quilt_output_famF102, bqFilter = 17, panel_size = NA, posfile = , genfile = , phasefile = , maxDifferenceBetweenReads = 1e+10, make_plots = TRUE, verbose = TRUE, shuffle_bin_radius = 5000, iSizeUpperLimit = 1e+06, record_read_label_usage = FALSE, record_interim_dosages = FALSE, use_bx_tag = TRUE, bxTagUpperLimit = 50000, addOptimalHapsToVCF = FALSE, estimate_bq_using_truth_read_labels = FALSE, override_default_params_for_small_ref_panel = TRUE, gamma_physically_closest_to = NA, seed = NA, hla_run = FALSE, downsampleToCov = 6, minGLValue = 1e-10, minimum_number_of_sample_reads = 2, nGen = 20, reference_haplotype_file = /home/jjohnso/project-aces/vv31/quiltinputs/chr1.hap.gz, reference_legend_file = /home/jjohnso/project-aces/vv31/quiltinputs/chr1.legend.gz, reference_sample_file = /home/jjohnso/project-aces/vv31/quiltinputs/chr1.samples, reference_populations = NA, reference_phred = 30, reference_exclude_samplelist_file = , region_exclude_file = , genetic_map_file = /home/jjohnso/project-aces/vv31/gpinfo/generic.geneticmap.txt, nMaxDH = NA, make_fake_vcf_with_sites_list = FALSE, output_sites_filename = NA, expRate = 1, maxRate = 100, minRate = 0.1, print_extra_timing_information = FALSE, block_gibbs_iterations = c(3, 6, 9), n_gibbs_burn_in_its = 20, plot_per_sample_likelihoods = FALSE, use_small_eHapsCurrent_tc = FALSE) [2024-04-09 12:11:31] Overriding default parameters for small reference panel [2024-04-09 12:11:31] Observing number of reference haplotypes K=0 [2024-04-09 12:11:31] Reset n_seek_its from 3 to 1 [2024-04-09 12:11:31] Set Ksubset to 0 (no longer necessary) [2024-04-09 12:11:31] Set Knew to 0 (no longer necessary) [2024-04-09 12:11:31] There are 960215 SNPs in this region [2024-04-09 12:11:31] Imputing sample: 1

karstyl commented 2 months ago

So, I did a re-run this morning with the same input and it is now working. If you notice in the above it does not include "Running QUILT_prepare_reference." I know there were some issues on my cluster recently, this one seems weird, but was likely from my end. (Frustrating after trying to debut for a while)

rwdavies commented 2 months ago

OK great. What likely happened is that QUILT first pre-processes the reference data and saves it, and then imputes samples. It will save a copy of the pre-processed reference data, so that next time, or with a next set of samples, it can skip that step. If there was an error with it before, it will just load the same data and you'll get the same error.

Sorry for any confusion