vcflib / bio-vcf

Smart VCF parser DSL
MIT License
82 stars 23 forks source link

VCF soft filters #30

Closed chapmanb closed 8 years ago

chapmanb commented 9 years ago

Pjotr; I'm enjoying using bio-vcf and the syntax for filtering based on samples is incredibly useful for more complex filters. Is it possible to do soft-filtering with bio-vcf by adding tags to the FILTER column instead of excluding variants? I'd like to plug bio-vcf in for some somatic filters where I need to select based on tumor FORMAT values, but want to soft filter the variants rather than remove them. Thanks again for all your work on this.

pjotrp commented 9 years ago

bio-vcf has the --rewrite switch with which you can inject data. I used this to inject sample names:

for x in *.Somatic.vcf ; do ~/izip/git/opensource/ruby/bioruby-vcf/bin/bio-vcf --rewrite "rec.info['sample']='$x'[7..13]" < $x > rewrite/$x ; done

in the rewrite you can also add a conditional. If you send me an example I'll try and make it work.

chapmanb commented 9 years ago

Pjotr; Thanks much. I'm looking for the equivalent of applying a soft filter to a VCF file like in bcftools:

bcftools filter -m + -s LowDepth -e `INFO/DP < 10`

http://samtools.github.io/bcftools/bcftools.html#filter

This adds a ##FILTER line to the header and updates the FILTER column when variants fail, instead of removing them. This is the standard way we do filtering to avoid throwing away data since folks can re-filter later if needed. Thanks again.

pjotrp commented 9 years ago

Hi Brad,

Soft-filter support is added with --add-filter in the master for --filter switch. Please test the latest. https://github.com/pjotrp/bioruby-vcf/commit/37511c21c710871fd2f5f717b6b8dd47b627b827

Above example would be

bio-vcf  --add-filter LowDepth --filter `r.info.dp < 10`

I'll do the sample filters after.

Pj.

chapmanb commented 8 years ago

Pjotr; Thanks much for this. I spotted one small issue in not over-writing the FILTER field if it was previously set as PASS, but otherwise looks great. Support for sample field selections would be great and let us use this for complex tumor/normal where bcftools currently doesn't not have enough expressibility. Thanks again.

pjotrp commented 8 years ago

The sample filters also support soft filters now.