How exons should be specified in the reference FASTA file?

Hi,

I tried to launch T1K on my data, but got segfault with the following trace

#0  0x000055555556cd03 in SeqSet::SetSeqExonInfo (exons=std::vector of length 26, capacity 32 = {...}, seqIdx=14, this=0x7fffffffbc10) at /data1/biosoft/T1K/SeqSet.hpp:669
#1  SeqSet::InputRefSeq (this=this@entry=0x7fffffffbc10, id=id@entry=0x5555647182c0 "chr15", read=read@entry=0x7fffee8f4010 'N' <repeats 200 times>..., weight=weight@entry=1,     initExonInfo=initExonInfo@entry=true,     comment=comment@entry=0x55558dff59d0 " AC:CM000677.2  gi:568336009  LN:101991189  rl:Chromosome  M5:f036bd11158407596ca6bf3581454706  AS:GRCh38")    at /data1/biosoft/T1K/SeqSet.hpp:979
#2  0x0000555555571806 in Genotyper::InitRefSet (this=this@entry=0x7fffffffb9d0, 
    filename=filename@entry=0x5555555950d0 "/data1/mnt/LMBALL/RefData_VMP01/references/38/GDC/GRCh38.d1.vd1.fa") at /data1/biosoft/T1K/Genotyper.hpp:700
#3  0x00005555555576d1 in main (argc=14, argv=0x7fffffffdfe8) at Genotyper.cpp:331

As I understand from the source code, T1K expects that a reference FASTA file will contain exons in the metadata lines. But metadata lines in my FASTA files look like that

>chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38

I think that what is causing segfault

What is an expected format for metadata lines in FASTA files for T1K?

Also I am very curious about this loop. Why does it start with 1 and not 0? And it seems that it also assumes that size is odd (otherwise there will be read out of bounds). Is that something expected?

Thank you in advance

mourisl / T1K

How exons should be specified in the reference FASTA file? #33