tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

align with DSMZ #85

Open aubreybailey opened 2 years ago

aubreybailey commented 2 years ago

Hi Thomas, Not a bug, but I'm trying to figure out how to use this with traditional STR checking applications such as https://www.dsmz.de/fp/cgi-bin/str.html

I downloaded and processed three 293T files from the SRA, and ran HipSTR following your example workflow through until the datamash reformat. Attached is a grep of the loci that DSMZ's form uses.

#CHROM  chr5
POS 123775551
ID  D5S818
REF CCTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCT
ALT CCTCTATCTATCTATCTATCTATCTATCTATCTATCT,CATCTATCTATCTATCTATCTATCTATCTATCTATCTATCT
QUAL    .
FILTER  .
INFO    INFRAME_PGEOM=0.95;INFRAME_UP=0.05;INFRAME_DOWN=0.05;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0.01;OUTFRAME_DOWN=0.01;START=123775556;END=123775599;PERIOD=4;NSKIP=0;NFILT=0;BPDIFFS=-12,-8;DP=33;DSNP=0;DSTUTTER=2;DFLANKINDEL=2;AN=6;REFAC=0;AC=3,3
FORMAT  GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS
BGI_293T_SRR14016673    1|2:-12|-8:1.00:0.50:8:0:0:1:6.50|1.50:0|0:3.12:-0.90:7:-0.54:-12|3;-8|2:-12|3;-8|1
BGI_293T_SRR14016674    2|1:-8|-12:1.00:0.50:9:0:1:0:3.50|5.50:0|0:8.39:-0.14:8:-0.33:-12|6;-8|1:-12|6;-8|1
BGI_293T_SRR14016675    2|1:-8|-12:1.00:0.50:16:0:1:1:8.50|7.50:0|0:19.17:-0.00:13:-0.25:-16|1;-12|5;-8|3:-16|1;-12|5;-8|3

293T_DSMZ_loci.xlsx