nmdp-bioinformatics / ImmunogeneticDataTools

Immunogenetic Data Tools related to HLA, GLStrings, Linkage Disequilibrium
11 stars 7 forks source link

Delimited text output #33

Closed biotronette closed 8 years ago

biotronette commented 9 years ago

Proposed output of GL string batch processing:

Column names; description

Sample ID; self-explanatory nA; number of alleles in HLA-A ambiguity string (sum over both haplotypes) nB, nC, nDRB1, etc. nC1 linkages; number of Class 1 linkages found nC2 C1 min L(genotype); minimum difference between haplotype likelihoods across all populations C2 min L(genotype)

Notes: the last two columns are intended to capture instances where there is true haplotypic ambiguity given population frequency data (i.e. the most haplotypic ambiguity occurs in a population has a .35 likelihood for one combo and .65 for the other, with a difference of .3) - example forthcoming.

mpresteg commented 9 years ago

@JuliaUdell - Started on this. An example for the last two columns will be useful.

mpresteg commented 7 years ago

No longer distinguishes between class I and class II. Instead it will count the linkages and compute the minimum difference for each ld sought (bc, drb_dq, five_loc, six_loc, etc).

kosoegawa commented 7 years ago

Please add header, so easy to understand what these numbers are.