jamescasbon / PyVCF

A Variant Call Format reader for Python.
http://pyvcf.readthedocs.org/en/latest/index.html
Other
400 stars 200 forks source link

Missing lines in vcf writer #125

Open omerfarukgerdan opened 10 years ago

omerfarukgerdan commented 10 years ago

vcf_reader = vcf.Reader(open(path,"r"), strict_whitespace=True)

vcf_writer = vcf.Writer(open(new.vcf, 'w'), vcf_reader, lineterminator='\n')

for record in vcf_reader: vcf_writer.write_record(record)

When I run the command, I see 11 lines missing from the end. I understand cleaning of some lines at the meta but have no idea what causing this. Any fix or information what I am doing wrong?

martijnvermaat commented 10 years ago

Hi @omergerdan, thanks for your report. Could you provide some additional information, i.e., the file you used for testing and which lines you are missing?

omerfarukgerdan commented 10 years ago

I used a vcf file which has 200k+ lines. Meta and most lines are written correctly however 11 lines from the end is missing at new file.

I wanted to check if I could re-write my vcf correctly with pyvcf but couldnt.

There is nothing specific at the vcf file where it fails, just regular lines, chrom id, pos db id etc.

martijnvermaat commented 10 years ago

I'm affraid we really need a concrete case of this to further analyse the problem.

Actually, our unit tests include some simple cases where it is asserted that the writer outputs exactly the records it was given. So there is probably something specific either in your setup or input file (but of course it can still be a but in PyVCF).

omerfarukgerdan commented 10 years ago

There is nothing really specific at the position. I will run some more tests and post them here hopefully next week if we can't come up with a solution.

martijnvermaat commented 10 years ago

Is it consistent? It would be nice if you could come up with an example of a small file that gives the problem.

omerfarukgerdan commented 10 years ago

I ran it twice and got the same result for the same trial, I can claim it is consistent. However I was busy at that time therefore to be sure it would be better for me to rerun for posting samples and clarify the problem better.

jamescasbon commented 10 years ago

VCF writing kind of snuck in to this project without proper development - in a way it was to easy. @omergerdan please can we have a test case, as this would greatly help us do this properly.

Thanks.