Open ttriche opened 7 years ago
Hey I think I have an idea, maybe this is just off-by-one or off-by-two?
Here is what epiread.txt (above) looks like:
chr1 MG01HX08:278:H3VCCALXX:8:1104:27052:45892 1 - 10468
CCCCTC 10341 CCATCACC
Unless something funny is happening, in wztsv.h, https://github.com/zwdzwd/utils/blob/3281eb6f70e431480a036d33ba587c70f6fc26c7/wztsv.h#L86, tsv_read should fill up the tsv struct "in" as follows:
(gdb) explore in
'in' is a pointer to a value of type 'tsv_t'
Continue exploring it as a pointer to a single value [y/n]: y
The value of '*in' is of type 'tsv_t' which is a typedef of type 'struct tsv_t'
The value of '*in' is a struct/class of type 'struct tsv_t' with the following fields:
fields = <Enter 0 to explore this field of type 'char **'>
n = <Enter 1 to explore this field of type 'size_t'>
m = <Enter 2 to explore this field of type 'size_t'>
fp = <Enter 3 to explore this field of type 'gzFile'>
finished = 0 .. (Value of type 'int')
line_max = 10000 .. (Value of type 'int')
line = <Enter 6 to explore this field of type 'char *'>
Enter the field number of choice: q
(gdb) print in
$1 = (tsv_t *) 0x6e60a0
(gdb) print in->fields[0]
$2 = 0x6f0550 "chr1"
(gdb) print in->fields[1]
$3 = 0x6f0570 "MG01HX08:278:H3VCCALXX:8:1104:27052:45892"
(gdb) print in->fields[2]
$4 = 0x6f05b0 "1"
(gdb) print in->fields[3]
$5 = 0x6f0530 "-"
(gdb) print in->fields[4]
$6 = 0x6f0600 "10468"
So it looks like this is maybe an off-by-one error?
If I replace the code with
assert(strlen(in->fields[2]) == 1);
assert(strlen(in->fields[3]) == 1);
it looks like we are OK, although then it's going to corrupt the _snp / _cg code following. Let's see...
nb. It's not an issue of single-end vs paired-end epiread format:
gdb -q --args ./biscuit asm epiread.paired.txt
Reading symbols from ./biscuit...done.
(gdb) b main_asm
Breakpoint 1 at 0x453853: file src/asm_pairwise.c, line 91.
(gdb) r
Starting program: /home/tim/Dropbox/biscuit/biscuit asm epiread.paired.txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
...
(gdb) print in->fields[3]
$1 = 0x6f0530 "CCCT"
(gdb) print in->fields[4]
$2 = 0x6f05e0 "."
(gdb) n
biscuit: src/asm_pairwise.c:124: main_asm: Assertion `strlen(in->fields[3]) == 1' failed.
followup: changing
assert(strlen(in->fields[3]) == 1);
assert(strlen(in->fields[4]) == 1);
to
assert(strlen(in->fields[2]) == 1);
assert(strlen(in->fields[3]) == 1);
allows things to proceed and eventually the program exits normally, although I haven't come across an ASM-possessing region in my osteosarcoma to test it out on. So maybe it is an off-by-one error?
I need a positive control before I'm willing to send in a patch :-)
Compiled with debug symbols and run under gdb...
I'm having a hard time figuring out why it is asserting that a coordinate should be length 1...
epiread.txt