Closed lovelycatZ closed 6 months ago
Hi! You can follow this issue (#15). Bcftools need to be used.
Hello Zheng,
Thank you for your soon reply! It works well for after using BCFTOOLS to merge outcomes of multiple SV callers. But the final result file named 3.final.vcf.gz had no records, and there was only a header section. While I used the SURVIVOR to merge and I got some records. Why?
Can you provid vcf files?
vcfs.zip There are 4 vcf files from 4 CNV callers.
Your VCF file is not in common format.
GT
should be 0/0
, 0/1
, 1/1
, 0/2
, et al. However, GT
in 21B01474858_cn.MOPS.vcf.gz
was one number, which cannot be recognized by PanPop or SURVIVOR.
GT
even not exists in file 21B01474858_ExomeDepth.vcf.gz
, 21B01474858_GATK_gCNV.vcf.gz
, and 21B01474858_Manta.vcf.gz
.
You may manully add GT
to those VCF files and rerun PanPop or SURVIVOR.
Yes, you are right. Actually, the output of some callers is not in VCF format (such as TSV format), so manual conversion is taken. But what if I can't infer whether a CNV is homozygous or heterozygous? Does the GT tag have to exist in the VCF files to run the PART standalone script?
GT
must exists to use PART script. But I dont know how to deal with CNVs, unless you can transformat it into normal vcf file. Perhaps the CNV=0 means a deletion of GT=1/1, CNV=1 means a deletion of GT=0/1, CNV=3 means a insertion of GT=0/1?
OK, I'll try to provide GT
tag, thanks for your convenient time!
This time there are a lot of records in the final VCF file, but it seems like all of them are not merged. Can you give me some advice according to the data? data.zip
There were no sequence info in VCF files. REF
and ALT
(colunm 4 and 5) need to be completed. Many be you can refer to https://github.com/starskyzheng/panpop/blob/main/scripts/long_caller_parser.pl
.
Do you mean I have to have the details for column 4 and 5? But if a SV is DEL or DUP, it make no sense to present sequence in REF
or
ALT
. And what does this script do?
For DUP or DEL, the information of REF
and ALT
can be determed by other columns. But in common VCF format, REF and ALT must be specific.
Some of sv-callers also generate VCFs without REF and ALT info, so we write this script as transformation. But you may do some modification to apply to your dataset.
DUP is a specific type of insertion.
Thank you for your time! I noticed that this script was customized for the internal callers and it's a little hard for me to fully understand script/long_caller_parser.pl
as I am not familiar with PERL.
This error that all the records were not merged was due to no sequences were provided for REF
and ALT
, right? What if I add the sequences? Is this step should be performed before or after the bcftools merge
process? Would you please provide a VCF file that was processed by script/long_caller_parser.pl
so I could make the correct format?
yes, this process must be done before bcftools merge
.
VCF file could like this: https://samtools.github.io/hts-specs/VCFv4.2.pdf
.
In some cases, a DEL
or DUP
has a length of more than 1000bp, should we also add detailed sequences for REF
and ALT
?
yes.
Hello Zheng,
I'm sorry to have bothered you so long.
After I added the detailed sequences for REF
, I got the results that seemed to be a little different than before. But there were still the same warnings as before in the process:
Done reading ref fasta read 195 contig / total 3099922541 bp
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! no alt[0] from aln software(HAlignC)! redo! No. 1
Warn! aln software(famsaP) exit status(1)! redo! No. 1
Warn! aln software(mafft) exit status(1)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
0 not in alts: $VAR1 = [];
at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 333.
realign_alts::alt_alts_to_muts(undef, 3, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 311
realign_alts::process_alts(ARRAY(0x2b77a5d9f070), 3, 12452, undef, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 495
main::process_line_new(ARRAY(0x2b77a5d9f070), HASH(0x2b77a59f9890), "chr16", 152760) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 446
main::process_lines_new(ARRAY(0x2b77a531e2e8), "chr16", 152760, 165211) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 254
main::mce_run(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 489
MCE::_worker_do() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 593
MCE::_worker_loop() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 714
MCE::_worker_main() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2057
MCE::_dispatch() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2101
MCE::_dispatch_child() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 692
MCE::spawn(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 991
MCE::run() called at /data/xiangxd/perl5/lib/perl5/MCE/Flow.pm line 429
MCE::Flow::run(CODE(0x2b76d147bb78), CODE(0x2b76d1467330)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 193
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
All Done
This time, I provided 3 VCF files which were from 3 different callers. But when I added more VCF files to the PART
standalone script, this warning came out and the running program just got stuck:
0 not in alts: $VAR1 = [];
at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 333.
realign_alts::alt_alts_to_muts(undef, 3, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 311
realign_alts::process_alts(ARRAY(0x2b77a5d9f070), 3, 12452, undef, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 495
main::process_line_new(ARRAY(0x2b77a5d9f070), HASH(0x2b77a59f9890), "chr16", 152760) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 446
main::process_lines_new(ARRAY(0x2b77a531e2e8), "chr16", 152760, 165211) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 254
main::mce_run(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 489
MCE::_worker_do() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 593
MCE::_worker_loop() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 714
MCE::_worker_main() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2057
MCE::_dispatch() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2101
MCE::_dispatch_child() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 692
MCE::spawn(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 991
MCE::run() called at /data/xiangxd/perl5/lib/perl5/MCE/Flow.pm line 429
MCE::Flow::run(CODE(0x2b76d147bb78), CODE(0x2b76d1467330)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 193
In addition to that, there were too many ./. GT
tag in 3.final.vcf
, is that correct? Here's attached a subsample and reduced version.
Many thanks! 3.final.zip
Your input VCF is in wrong format.
For deletion, REF
is the SV sequence, ALT
is *
or the first base of REF
For DUP or insertion, REF
is one base of reference genome of this position of mutation. ALT
is the SV sequence.
Great thanks for your help!
I've successfully run both two steps of the PART
standalone script and got the result without any errors and warnings. I noticed that the INFO tag ZORIPOS=XXX;ZORIEND=XXX;ZLEN=XXX
was within some records but not all. And some GT
tag was "./." How can we interpret the result?
part.zip
As for SVs you can use bin/Fill_DP.pl
.
However, your VCF is for CNVs, I don't know how to deal with it.
I have built the conda
enviroment according to the guide, but it get wrong for perl bin/Fill_DP.pl
:
Can't locate zzIO.pm in @INC (you may need to install the zzIO module) (@INC contains: /data/xiangxd/perl5/lib/perl5/x86_64-linux-thread-multi /data/xiangxd/perl5/lib/perl5 /data/xiangxd/software/miniconda3/envs/panpop/lib/site_perl/5.26.2/x86_64-linux-thread-multi /data/xiangxd/software/miniconda3/envs/panpop/lib/site_perl/5.26.2 /data/xiangxd/software/miniconda3/envs/panpop/lib/5.26.2/x86_64-linux-thread-multi /data/xiangxd/software/miniconda3/envs/panpop/lib/5.26.2 .) at bin/Fill_DP.pl line 26. BEGIN failed--compilation aborted at bin/Fill_DP.pl line 26.
I can't even find the zzIO
module on the PREL website.
Fixed. Please try again. (bffe3f0) However, I don't think this script can handle your VCF properly.
In fact, I just want to know which records in the VCF file were merged by PART
and which are not.
Sorry, PanPop do not support this function.
If no more activity, this issue will be closed.
Hello Zheng,
Thank you for providing a very nice tool. Currently, I am curious about how to merge outcomes of multiple SV callers. How to provide some vcf files to the standalone PART script? I noticed that the --in_vcf option can support only one file.
Thank you for your soon reply!