knausb / vcfR

Tools to work with variant call format files
248 stars 54 forks source link

blank row crashes R - read.vcfR() #141

Open TomJamesW opened 5 years ago

TomJamesW commented 5 years ago

Hello Brian,

I'm using SLiM 3 to generate vcf files which I am analysing in R, however occasionally read.vcfR() was causing my R session to 'encounter a fatal error' and abort. I followed your advice to determine the problem and it seems it is crashing because the vcf file has two completely blank lines in it. I appreciate this may be more of an issue with SLiM but I was hoping it might be possible for vcfR to accommodate this, or if you have any other suggestions?

I have attached two files - blankrow346.vcf contains blank lines before POS=346 and causes R to abort when read.vcfR() is run, and clean.vcf is of a similar size and style but works fine. The files are being automatically generated so unfortunately I can't manually edit them. I'm using the most recent GitHub version of vcfR.

Blanklineex.zip

Many thanks, Tom

TomJamesW commented 5 years ago

p.s. even if there was a way for this to produce a warning/error instead of crashing R that would be beneficial

knausb commented 5 years ago

Hi @TomJamesW , thanks for bringing this to my attention! And double thanks for example files, that really helps. I think the best answer here is that you should let the SLiM people know about this. The VCF specification v4.3 section 1 reports zero length records are not allowed. Although, your file does appear to report itself to be v4.2.

That said, I agree that it would be nice to handle that more elegantly. Your file "clean.vcf" reads in fine. But I see lines 16, 135, 243, and possibly others as blank. In your file "blankrow346.vcf" I see two empty lines before the variant at position 346. If I remove one of these I can read it in. If I add three empty lines it crashes. So this appears to be issue as having to do with more than one empty line, as you report. So we shall add this to the to-do list.

If you're looking for a quick fix, you could grep out the empty lines.

TomJamesW commented 5 years ago

Thanks for your speedy response! I'll let the SLiM people know too - I have a feeling they have been tinkering with the way they (simplify and) output vcf files, so they may not have realised there has been this affect.

I hadn't noticed the blank lines in "clean.vcf" either, will also pass this on and try using grep in the meantime.

Thanks for your help.

knausb commented 5 years ago

Oops, I'm going to leave this issue open until I get a chance to address it.