lh3 / yak

Yet another k-mer analyzer
MIT License
117 stars 9 forks source link

Documentation for trio eval #3

Open zeeev opened 4 years ago

zeeev commented 4 years ago

Dear Heng,

Trying out the trio eval tool, it's a great addition.

Can you share the meaning of S/W/H records?

I'm guessing an S is a sequence?

S ctg.000223F 13 0 12 0 0 0 S ctg.000227F 27 3 23 3 3 0 S ctg.356 10 136 3 7 7 128 W 152511 1638448 0.093083 H 267877 1638666 0.163473

lh3 commented 4 years ago

The last number on the W-line gives the sWitch error rate. The last number of the H-line gives the Hamming error rate. The first two numbers on the S-line give the paternal (0) and maternal (1) sites for each contig. The last four numbers give the number of 00, 01, 10 and 11 transitions.

gsarah commented 3 years ago

Hello, It seems to me that the answer to the mentionned issue is incomplete. I have the same question. I used yak triobin to obtain the reads used by hifiasm for each haplotype. I have this table as result:

m64071_201031_183434/10/ccs m   0   6156    0   6495    0   10  15137   58
m64071_201031_183434/13/ccs m   0   1445    0   1561    9   28  14360   349
m64071_201031_183434/14/ccs p   470 0   533 24  1   139 18263   259
m64071_201031_183434/29/ccs p   1716    0   2508    3   243 87  18602   2554
m64071_201031_183434/38/ccs p   186 0   349 25  94  24  15667   599
m64071_201031_183434/39/ccs m   0   8013    0   8109    0   5   15197   188
m64071_201031_183434/40/ccs p   2612    0   3779    0   0   14  17954   153
m64071_201031_183434/42/ccs m   0   3441    10  4216    141 147 15287   1492

I guess i can obtain my paternal reads with seqtk using the read id where the second column is tagged as p. Same for the maternal reads with tag m. What doesa and 0 stand for in the second column? Can you describe the other fields?

lh3 commented 3 years ago

a=ambiguous, meaning there are strong supports from both parental haplotypes. 0=none, meaning there are no parent-specific k-mers.