jamescasbon / PyVCF

A Variant Call Format reader for Python.
http://pyvcf.readthedocs.org/en/latest/index.html
Other
404 stars 200 forks source link

VCF Filtration doesn't work with uncompressed, or UTF-8 #317

Open OgnjenMilicevic opened 5 years ago

OgnjenMilicevic commented 5 years ago

Hi!

I tried applying your filtering script on an uncompressed VCF, and it returns an error: TypeError: startswith first arg must be bytes or a tuple of bytes, not str

It works when I gzip the file, but it fails on an UTF-8 character.

I see some fixed but the vcf_filter has not been changed for 6 years, any chance we can get an update on it to handle these two issues?

OgnjenMilicevic commented 5 years ago

The core of the first problem seems to be that you are parsing the input with the binary opener long before you pass it to reader and assign it a reader:

parser.add_argument('input', metavar='input', type=argparse.FileType('rb'), nargs='?', default=None,
        help='File to process (use - for STDIN)')