Closed Overcraft90 closed 1 year ago
This is too vague unfortunately, what makes you think the program is not doing what is supposed to do?
The command
bcftools isec -C A.vcf.gz B.vcf.gz
will print sites in A not present in B, that's what it does and nothing else.
The compressed VCF should work in all recent versions of bcftools, maybe you are using an old version?
$ bcftools isec -Oz -C A.vcf.gz B.vcf.gz -w 1 -o rmme.vcf.gz
$ htsfile rmme.vcf.gz
rmme.vcf.gz: VCF version 4.1 BGZF-compressed variant calling data
$ bcftools --version
bcftools 1.17-5-g36f1b0c1
Hi @pd3 thanks for the prompt reply,
Sorry for the vague description I'm just trying to wrap my head around what's going on. For instance, without the -w flag I get a BED file in a list format, so one question is what the -w does specifically.
Also, I see what you mean when you said that all recent versions of the tool (I'm currently running bcftools 1.16
) compress the output file; however, I was getting confused as I've always been running zcat <filename> | less -S
for visualising compressed files. But it seems, and I just verified myself today, that a simple less
would do. Thanks again.
The -w
option may seem a bit confusing in this context as with -C
there is no ambiguity in which of the files should be printed: the one listed on the command line as first, i.e. A.vcf.gz
in my example. However, if your command had, say, two files and the -n =2
option, then the -w
option can be used to choose which files should be created in the -p
directory: the first (-w 1
), the second (-w 2
), or both (-w 1,2
).
I don't understand the comment about zcat
vs less
, it seems to suggest that you are indeed getting an uncompressed file despite -Oz
. I hope this is not an issue, but if yes, please provide a small test case to reproduce the problem.
Hi again @pd3,
Thanks for the clarification about -w 1
. Concerning the (un)compressed output file, below the command I used
bcftools isec -Oz -C /media/scratch/9.sub-sampling_pangenome_graphs/happy_results_hap-samp/eval_snpsFP.vcf.gz /media/scratch/9.sub-sampling_pangenome_graphs/happy_results_filtered/eval_snpsFP.vcf.gz -w 1 -o igv_snpsFP.vcf.gz
Also, I attach the VCF files I'm trying to get the complement of as well as the output; let me know whether this help, thanks again. eval_snpsFP_filtered-graph.vcf.gz eval_snpsFP_hapsamp-graph.vcf.gz igv_snpsFP.vcf.gz
P.S. These are the complement files only for the FPs analysis for the hap.py
evaluation, for the FNs I followed exactly the same procedure; also, I didn't include indexes which are quite fast to generate
I am not finding a problem
$ bcftools isec -Oz -C eval_snpsFP_filtered-graph.vcf.gz eval_snpsFP_hapsamp-graph.vcf.gz -w 1 -o rmme.vcf.gz
$ htsfile rmme.vcf.gz
rmme.vcf.gz: VCF version 4.1 BGZF-compressed variant calling data
In this specific scenario this works even if one leaves -Oz
out as -o
recognizes the desired type from the file name suffix
$ bcftools isec -C eval_snpsFP_filtered-graph.vcf.gz eval_snpsFP_hapsamp-graph.vcf.gz -w 1 -o rmme.vcf.gz
$ htsfile rmme.vcf.gz
rmme.vcf.gz: VCF version 4.1 BGZF-compressed variant calling data
Hi there I have two VCF files for HG002 benchmarked with
hap.py
How can I filter out/subtract from this second VCF file the false positives (FPs) not concerned with the sub-sampling process so that I'm sure in the resulting VCF I look only at errors introduced by sub-sampling? Currently, I'm running this command
But it seems to not be working correctly..., for one I cannot get the output VCF in a compressed format; secondly, I'm not sure the command is correct and is doing what is supposed to. Any help is appreciated, thanks in advance!