starskyzheng / panpop

Application of pan-genome for population
MIT License
96 stars 9 forks source link

A question about PART_run.pl #36

Closed rain-zjg closed 5 months ago

rain-zjg commented 7 months ago

Hello, Zheng! I called SVs form NGS data using three different softwares: Manta, Delly and Smoove, then I combined results from differnent callers for each sample using "bcftools concat" and further combined the results of each sample into one single vcf file using "bcftools merge". Then I started to refine the final dataset of SVs using "PART_run.pl" in panpop. However, the script could not be implemented completely and it reported many warnings, like:

Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! no alt[0] from aln software(HAlignC)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! no alt[0] from aln software(HAlignC)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! no alt[0] from aln software(famsaP)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(mafft) exit status(1)! redo! No. 1
0 not in alts: $VAR1 = [];
 at /home/tetrastigma/Dipteronia/4_combine/panpop-main/scripts/../lib/realign_alts.pm line 333.
        realign_alts::alt_alts_to_muts(undef, 54, ARRAY(0x55d69526db20)) called at /home/tetrastigma/Dipteronia/4_combine/panpop-main/scripts/../lib/realign_alts.pm line 311
        realign_alts::process_alts(ARRAY(0x55d696e25f20), 54, 138, undef, ARRAY(0x55d69526db20)) called at /home/tetrastigma/Dipteronia/4_combine/panpop-main/bin/../scripts/realign.pl line 497
        main::process_line_new(ARRAY(0x55d696e25f20), HASH(0x55d69671a9c0), "HIC_ASM_0", 7173697) called at /home/tetrastigma/Dipteronia/4_combine/panpop-main/bin/../scripts/realign.pl line 446
        main::process_lines_new(ARRAY(0x55d696a534e8), "HIC_ASM_0", 7173697, 7173822) called at /home/tetrastigma/Dipteronia/4_combine/panpop-main/bin/../scripts/realign.pl line 254
        main::mce_run(MCE=HASH(0x55d694e26588)) called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE/Core/Worker.pm line 489
        MCE::_worker_do() called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE/Core/Worker.pm line 593
        MCE::_worker_loop() called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE/Core/Worker.pm line 714
        MCE::_worker_main() called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE.pm line 2057
        MCE::_dispatch() called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE.pm line 2101
        MCE::_dispatch_child() called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE.pm line 692
        MCE::spawn(MCE=HASH(0x55d694e26588)) called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE.pm line 991
        MCE::run() called at /home/tetrastigma/miniconda3/envs/panpop/lib/site_perl/5.26.2/MCE/Flow.pm line 429
        MCE::Flow::run(CODE(0x55d69523b3a0), CODE(0x55d69523ac50)) called at /home/tetrastigma/Dipteronia/4_combine/panpop-main/bin/../scripts/realign.pl line 193

And then the script remained in this state for several days. So what are the reasons leading to the warnings? Could you help me? And here is a subset of the original vcf file. test.vcf.gz

starskyzheng commented 7 months ago

The REF and ALT column in VCF file must be sequence bases.

image

rain-zjg commented 7 months ago

Thanks for your reply. Actually, I called SVs for each sample using Delly, Smoove and Manta based on NGS data, then bcftools was adopted to merge the results. Since the results of differnent softwares differ in the description of SVs, I just wonder how could I obtain the combined vcf file which is in the right format with sequences in the REF and ALT column. Cuold you give me some suggestions?

starskyzheng commented 7 months ago

Maybe this issue is the same as yours: https://github.com/starskyzheng/panpop/issues/17#issuecomment-1946226251

rain-zjg commented 7 months ago

Hello, Zheng! I have checked your comments in issue 17, but I am still confused. As you mentioned in : https://github.com/starskyzheng/panpop/issues/15#issuecomment-1932548662, the input file of "PART stand alone mode" must be one VCF file with multiple samples and you suggested the usage of bcftools to merge the results of different callers across multiple samples. And in issue 17, you said "But you can manually merge each sample separately using bin/PART_run.pl, and then merge SV at population-scale (also using bin/PART_run.pl)". Since the results of different callers are in multiple vcf files, it seems that the input files are more than one file, which may produce conflicts. So, what are the concrete functions of PART_run.pl? Is it aiming at the refinement of SVs or merging SVs like other tools (e.g. SURVIVOR, Jasmine)? If it could serve as a merging tool, could you provide command lines to perform the merging process with PART_run.pl (Just supposed there are three samples A, B, C and three callers were used: Delly, Smoove and Manta )? What's more, does the final generated vcf file record genotyping info (i.e. GT column), in other words, does PART_run.pl perform genotyping process like Paragrah, SVtyper or it just simply copy the genotype information from previous vcf files?

starskyzheng commented 7 months ago

you can use bcftools merge -m none to merge multiple VCF files together. PART_run.pl can realign each SVs inside that VCF file. Maybe you can have a look at this url: https://doi.org/10.24433/CO.1577027.v1

Snow0208 commented 7 months ago

Hello,Rain ! I have the same problem as you. Have you solved it now ?

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.