vibansal / crisp

Code for multi-sample variant calling from sequence data of pooled or unpooled DNA samples
MIT License
19 stars 8 forks source link

FLANKSEQ is padded with non-printable characters when variant is near the edge #10

Closed IgnasiLucas closed 4 years ago

IgnasiLucas commented 5 years ago

It seems that when a variant is near the beginning or maybe also the end of a chromosome or contig, some non-printable characters get included in the flanking sequence reported by the FLANKSEQ field in the INFO column. This makes the subsequent processing of the vcf file difficult. For example, grep complains that the file is binary and stops processing it.

vibansal commented 5 years ago

I have pushed a fix for this issue to the branch 'bugfix'. Can you check if it solves this problem? Thanks for reporting.

IgnasiLucas commented 5 years ago

That was fast. Yes, the bugfix fixed the issue. In my data, I did not have any variant near the end of a contig. But all variants close to the beginning of a contig now get a FLANKSEQ=.. Let me say, by the way, that compiling crisp in Ubuntu was not straight forward. I had to add the -no-pie flag to both Makefile and samtools/Makefile. May not be worth to open an issue for that. Best,