vcflib / vcflib

C++ library and cmdline tools for parsing and manipulating VCF files with python and zig bindings
https://github.com/vcflib/vcflib#vcflib
MIT License
619 stars 221 forks source link

Cleanup PL, GL after filter or primitive #147

Open zeeev opened 8 years ago

zeeev commented 8 years ago

@chapmanb @NeillGibson @ekg

I've opened this as issue: https://github.com/chapmanb/bcbio-nextgen/issues/1334#issuecomment-209889848

The cleanup will need to be implemented in each tool as there is no way to know which PLs to remove without the original VCF entry? Is that the case?

--Zev

chapmanb commented 8 years ago

Zev and Erik; Thanks for looking at this. It sounded from the other comments like Erik already had code for this but it wasn't being used currently. Is that right? From my perspective it could either happen in the tools themselves at the time of splitting or as a post processing step by checking if the PLs (and other attributes) match the number of alleles and removing attributes if not. I'm not sure which is easier/closer to already implemented, but happy to go with that. Thanks again for thinking about this.

NeillGibson commented 8 years ago

@zeeev Thank you for looking at this. I agree with Brad. Whatever option is easier would be fine for me. Either a new option in vcfallelicprimitives to drop the attributes for the inconsistent records or a post processing tool that would fix the inconsistent records. Thanks again.

NeillGibson commented 8 years ago

Hi @zeeev . Is there any progress on this issue? We keep running into this issue and find it difficult to filter out all variant cases that have this inconsistency. Thanks again for looking at this issue.

JorisBenschop commented 8 years ago

I understand that this issue is acknowledged as 'high priority' but there seems to have been very little activity since april. Could you please either reclassify this priority or (preferentially) give some notion on when this will be addressed. The current implementation gives corrupted vcf output which I think we should take very seriously.

zeeev commented 8 years ago

@NeillGibson @JorisBenschop You're correct. I haven't had time to work on it. @ekg can PRIMITIVE strip out the unused GT fields and is that something you can do? I've looked through the code and had trouble following. There isn't enough info for VCFFILTER to fix the problem, otherwise I'd do it.

averagehat commented 4 years ago

Is this fixed? Does the method proposed in https://github.com/vcflib/vcflib/issues/156#issuecomment-309311367 work correctly?