How to use stitch - Githubissues

Hey, So in the example, Q_CFW-SW_100.0a_recal.reheadered.bam and QCFW-SW100.0c_recal.reheadered.bam are different samples. The naming (100.0a vs 100.0c) refers to plate number I think. There was some other sort of Rosetta type file that helped us link to phenotypes.

In general, from bams, first you want a set of variants. You can either call them yourself using e.g. an approach like what we did in Jerome Nicod's 2016 Nature Genetics paper on outbred mice, published in the same issue of Nature Genetics as STITCH. Otherwise, for your population, you could look up a list of sites to impute.

The list of sites to impute gives you the pos.txt file. It's basically a subset of the first few columns of a VCF, columns 1,2,4,5 I think, subsetted to distinct bi-allelic SNPs. The gen.txt is optional and comes from samples that have also sequenced at high coverage, and gives an indication of accuracy as the algorithm progresses.

Hope that's enough to get a good start, good luck.

Best, Robbie

rwdavies / STITCH

How to use stitch #52