Closed szhan closed 1 year ago
I thought that setting mu to 1e-04 would improve imputation results. That parameter should reflect NGS error rate, which should dominant mutation rate (1e-08).
Relevant bits of code related to defining the probability of mismatch and switch.
In Par.java
:
/**
* Returns the default allele mismatch parameter for the specified number
* of haplotypes.
* @param nHaps the number of reference and target haplotypes
* @return the default allele mismatch parameter for the specified number
* of haplotypes
*/
public float err(int nHaps) {
if (nHaps <= 0) {
throw new IllegalArgumentException(String.valueOf(nHaps));
}
return (err>=0) ? err : liStephensPMismatch(nHaps);
}
/**
* <p>Return an approximation to the allele mismatch probability suggested by
* Li and Stephens. The approximation uses a Riemann sum approximation
* of the natural log function.</p>
*
* <p>Refs:
* Li N, Stephens M. Genetics 2003 Dec;165(4):2213-33 and
* Marchini J, Howie B. Myers S, McVean G, Donnelly P. 2007;39(7):906-13.</p>
*
* @param nHaps the number of haplotypes
* @return the allele mismatch probability suggested by Li and Stepehens
*/
public static float liStephensPMismatch(int nHaps) {
double theta = 1/((Math.log(nHaps) + 0.5));
return (float) (theta/(2*(theta + nHaps)));
}
Also, in ImpData.java
:
private static float[] pRecomb(float ne, int nHaps, double[] pos) {
float[] pRecomb = new float[pos.length];
double c = -(0.04*ne/nHaps); // 0.04 = 4/(100 cM/M)
for (int j=1; j<pRecomb.length; ++j) {
pRecomb[j] = (float) -Math.expm1(c*(pos[j] - pos[j-1]));
}
return pRecomb;
}
Next step is to print out forward and backward probability matrices from BEAGLE. Below are some instructions to compile a modified copy of BEAGLE source code. First is to compile the source code (from .java to .class). Second is to package the compiled code into a jar file.
javac -d ./beagle_test/ ./beagle_5.4/*/*.java
jar cvfm test.jar META-INF/MANIFEST.MF -C ./beagle_test/ .
The manifest file should specify which file is the application entry point, like below:
Manifest-Version: 1.0
Created-By: 11.0.13 (JetBrains s.r.o.)
Main-Class: main/Main
To compile a modified copy of BEAGLE 4.1, do the following:
mkdir ./beagle_4.1_mod/
javac -d ./beagle_4.1_mod/ ./beagle_4.1/*/*.java ./beagle_4.1/net/sf/samtools/*.java ./beagle_4.1/net/sf/samtools/util/*.java
jar cvfm beagle_4.1_mod.jar META-INF/MANIFEST.MF -C ./beagle_4.1_mod/ .
Run the program by:
java -jar beagle_4.1_mod.jar
Note that BEAGLE 4.1 only performs imputation if there are sites in the query VCF that is absent in the reference VCF. Otherwise, it won't run LS HMM.
The BEAGLE 4.1 algorithm is implemented in https://github.com/tskit-dev/tskit/pull/2815
This continues the thread started at https://github.com/szhan/tsimpute/issues/138. I've decided to move it over to here because it is more about analysis than tooling.