Closed polchan closed 7 months ago
Please try this branch: https://github.com/starskyzheng/panpop/tree/starskyzheng-patch-2
I did not tested this branch yet. Your feedback is welcome!
Thanks! I will give it a try later.
Sorry! I tried using the new branch. While the new branch was able to complete the first step of running PART_RUN.perl, I still encountered some issues during the second step:
Now reading ref fasta : Done reading ref fasta read 0 contig / total 0 bp chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. (base) [bocheng@Atlas OUTDIR_RUN2]$ cat 2.thin1.unsorted.vcf.gz.log OUTDIR_RUN2/1.realign0.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/1.realign0.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/merge_similar_allele.pl line 102 (base) [bocheng@Atlas OUTDIR_RUN2]$ cat 2.thin2.unsorted.vcf.gz.log OUTDIR_RUN2/2.thin1.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/2.thin1.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/sv2pav.pl line 102 (base) [bocheng@Atlas OUTDIR_RUN2]$ cd .. (base) [bocheng@Atlas pangenome]$ ll total 136593256 -rwxr-xr-x. 1 bocheng bocheng 0 Apr 13 22:03 MdOrin.fa -rwxr-xr-x. 1 bocheng bocheng 139870967765 Apr 11 10:10 merge_bcftools.vcf.gz drwxrwxr-x. 2 bocheng bocheng 4096 Apr 14 00:23 OUTDIR_RUN1 drwxrwxr-x. 2 bocheng bocheng 194 Apr 14 22:51 OUTDIR_RUN2 -rw-------. 1 bocheng bocheng 511069 Apr 14 00:23 step1.log -rw-rw-r--. 1 bocheng bocheng 5697 Apr 14 22:51 step2.log drwxrwxr-x. 2 bocheng bocheng 10 Apr 14 22:51 tmp (base) [bocheng@Atlas pangenome]$ cat step2.log Sun Apr 14 08:49:55 CST 2024 perl /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl --chr_tolerance --in_vcf OUTDIR_RUN1/3.final.vcf.gz --out_vcf OUTDIR_RUN2/1.realign0.unsorted.vcf.gz --ref_fasta_file MdOrin.fa --threads 25 --ext_bp_max 500 --ext_bp_min 50 --skip_mut_at_same_pos 2 --level 1 --tmpdir tmp 2>&1 | tee OUTDIR_RUN2/1.realign0.unsorted.vcf.gz.log Now reading ref fasta : Done reading ref fasta read 0 contig / total 0 bp chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. Sun Apr 14 22:51:53 CST 2024 bcftools sort OUTDIR_RUN2/1.realign0.unsorted.vcf.gz -o OUTDIR_RUN2/1.realign0.sorted.vcf.gz --temp-dir tmp/tmp_bcftools. Writing to tmp/tmp_bcftools.a4ZuT7 [E::bcf_hdr_read] Input is not detected as bcf or vcf format Could not read VCF/BCF headers from OUTDIR_RUN2/1.realign0.unsorted.vcf.gz Cleaning Sun Apr 14 22:51:53 CST 2024 perl /ngsproject/bocheng/tools/panpop/bin/../scripts/merge_similar_allele.pl --type 3 --invcf OUTDIR_RUN2/1.realign0.sorted.vcf.gz --outvcf OUTDIR_RUN2/2.thin1.unsorted.vcf.gz --sv2pav_merge_identity_threshold 0.6 --sv2pav_merge_diff_threshold 40 --threads 25 --tmpdir tmp 2>&1 | tee OUTDIR_RUN2/2.thin1.unsorted.vcf.gz.log OUTDIR_RUN2/1.realign0.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/1.realign0.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/merge_similar_allele.pl line 102 Sun Apr 14 22:51:54 CST 2024 bcftools sort OUTDIR_RUN2/2.thin1.unsorted.vcf.gz -o OUTDIR_RUN2/2.thin1.sorted.vcf.gz --temp-dir tmp/tmp_bcftools. Writing to tmp/tmp_bcftools.L6KYyr [E::hts_open_format] Failed to open file "OUTDIR_RUN2/2.thin1.unsorted.vcf.gz" : No such file or directory Could not read OUTDIR_RUN2/2.thin1.unsorted.vcf.gz Cleaning Sun Apr 14 22:51:54 CST 2024 perl /ngsproject/bocheng/tools/panpop/bin/../scripts/sv2pav.pl --invcf OUTDIR_RUN2/2.thin1.sorted.vcf.gz --outvcf OUTDIR_RUN2/2.thin2.unsorted.vcf.gz --enable_norm_alle 1 --max_len_tomerge 20 --sv_min_dp 40 --threads 25 2>&1 | tee OUTDIR_RUN2/2.thin2.unsorted.vcf.gz.log OUTDIR_RUN2/2.thin1.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/2.thin1.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/sv2pav.pl line 102 Sun Apr 14 22:51:54 CST 2024 bcftools sort OUTDIR_RUN2/2.thin2.unsorted.vcf.gz -o OUTDIR_RUN2/2.thin2.sorted.vcf.gz --temp-dir tmp/tmp_bcftools. Writing to tmp/tmp_bcftools.1gUY4J [E::hts_open_format] Failed to open file "OUTDIR_RUN2/2.thin2.unsorted.vcf.gz" : No such file or directory Could not read OUTDIR_RUN2/2.thin2.unsorted.vcf.gz Cleaning Sun Apr 14 22:51:54 CST 2024 ln -s 2.thin2.sorted.vcf.gz OUTDIR_RUN2/3.final.vcf.gz
I hope my feedback is helpful to you, and I look forward to seeing you release a new workflow.
What were the CHR column of the vcf file OUTDIR_RUN1/3.final.vcf.gz
?
hello!
this is the first few lines of OUTDIR_RUN1/3.final.vcf.gz
header.vcf.txt
VCF file looks OK.
Can you Please also provides ref.fa file: MdOrin.fa
Sorry! I think I've found the issue: MdOrin.fa
became 0 bytes. Strangely, I didn't encounter this problem during the first step of running PART_run.pl.
I'll rerun it and inform you of the outcome afterward.
Good news! It seems that I have successfully run all the processes of PART_run.pl.
However, I still have a question. The merge_bcftools.vcf.gz
file I initially inputted was 130 Gb, but the final output file 3.final.vcf.gz
is only 1.9 Gb. Is this size reduction reasonable? In what form does the remaining 120+ Gb of data exist?
I don't know. You must check by yourself. If you have any result please let me know.
Alright! I have reviewed the previous vcf files and the new vcf files after merging. The former vcf files had over 20,000,000 variations, while the merged vcf file has more than 9,400,000 variations. I believe the main reason for the reduction is not the decrease in the number of variations, but rather that the descriptions of variations within individual samples were not retained during the merging process. Of course, these descriptions do not affect subsequent analyses.
this is a smaple of previous vcf: previous.vcf.txt
this is a smaple of new vcf: new.vcf.txt
Thank you very much for the excellent software you have developed and the help you have provided recently!
Hello! zheng
I encountered some issue while utilizing PART_run.pl. I utilized my custom-prepared VCF file, derived from aligning NGS data against the pan-genome using VG. I think it is achieve a workflow akin to your NGSpipeline. When I run this command using
perl ~/tools/panpop/bin/PART_run.pl --in_vcf ./merge_bcftools.vcf.gz --outdir OUTDIR_RUN1 --ref_fasta_file ~/data/genome_data/malus/MdOrin.fa --threads 25 --tmpdir ./tmp
, it prompts me with the following error:So I hope you can help me solve this issue, if possible!
Thanks!
Best-wish!
Bo-cheng