rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
73 stars 19 forks source link

Forced imputation of founders. Does it make sense? #92

Closed GoliczGenomeLab closed 5 months ago

GoliczGenomeLab commented 5 months ago

Hi again,

This issue is related to a previously reported one https://github.com/rwdavies/STITCH/issues/89.

My families are made of 16 founders and 2 RILs, so I only need to impute the bam reads of the RILs with the SNPs from the founders. However, if I don't provide the path to the founders' bam reads I get the error:

Error in match_gen_and_phase_to_samples(sampleNames = sampleNames, gen = gen,  :
  The following gen file samples could not be matched to a sample specified from bamlist: bam1 bam2 bam3
Calls: STITCH -> match_gen_and_phase_to_samples
Execution halted

Therefore, if the founders are also imputed, they appear in the vcf. I don't need to impute these genotypes and, surprisingly, haplotype estimations do not seem to match with the actual founders.

I was wondering if this could have a negative effect on the imputation (I keep getting extreme recombination frequencies, as reported in issue89) of the RILs and could I switch off this option.

Thanks in advance.

Best wishes, Jose

rwdavies commented 5 months ago

STITCH will complain if you supply samples in the high coverage genotype file "genfile" but don't then try to impute those samples. So you can remove those samples from the genfile, and the program should work

If you have founders, and you're not just testing STITCH, it feels like the right thing to include them. Note that you can also provide a haplotype reference file for STITCH, which would make it even more accurate

Using the genfile to see imputation accuracy printed during the run won't affect imputation accuracy. It's only purpose is to see how imputation performance varies as a function of the number of iterations, and/or the influence of different heuristics on performance

jamonterotena commented 5 months ago

Hi @rwdavies,

You can resolve this issue, as I understand what I need to know now. I show why on #89

All best, Jose