starskyzheng / panpop

Application of pan-genome for population
MIT License
87 stars 8 forks source link

Error: cannot write to /tmp/bcftools.OL73yV/00024.bcf #37

Closed polchan closed 4 months ago

polchan commented 5 months ago

Hello! zheng

I encountered some issue while utilizing PART_run.pl. I utilized my custom-prepared VCF file, derived from aligning NGS data against the pan-genome using VG. I think it is achieve a workflow akin to your NGSpipeline. When I run this command using perl ~/tools/panpop/bin/PART_run.pl --in_vcf ./merge_bcftools.vcf.gz --outdir OUTDIR_RUN1 --ref_fasta_file ~/data/genome_data/malus/MdOrin.fa --threads 25 --tmpdir ./tmp, it prompts me with the following error:

bcftools sort OUTDIR_RUN1/1.realign0.unsorted.vcf.gz -o OUTDIR_RUN1/1.realign0.sorted.vcf.gz Writing to /tmp/bcftools.OL73yV [buf_flush] Error: cannot write to /tmp/bcftools.OL73yV/00024.bcf Cleaning Mon Apr 8 23:27:51 CST 2024 perl /ngsproject/bcguo/tools/panpop-main/bin/../scripts/merge_similar_allele.pl --type 3 --invcf OUTDIR_RUN1/1.realign0.sorted.vcf.gz --outvcf OUTDIR_RUN1/2.thin1.unsorted.vcf.gz --sv2pav_merge_identity_threshold 0.6 --sv2pav_merge_diff_threshold 40 --threads 16 --tmpdir ./tmp 2>&1 | tee OUTDIR_RUN1/2.thin1.unsorted.vcf.gz.log OUTDIR_RUN1/1.realign0.sorted.vcf.gz not exists! at /ngsproject/bcguo/tools/panpop-main/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN1/1.realign0.sorted.vcf.gz") called at /ngsproject/bcguo/tools/panpop-main/bin/../scripts/merge_similar_allele.pl line 102 Mon Apr 8 23:27:51 CST 2024 bcftools sort OUTDIR_RUN1/2.thin1.unsorted.vcf.gz -o OUTDIR_RUN1/2.thin1.sorted.vcf.gz Writing to /tmp/bcftools.ebuema [E::hts_open_format] Failed to open file "OUTDIR_RUN1/2.thin1.unsorted.vcf.gz" : No such file or directory Could not read OUTDIR_RUN1/2.thin1.unsorted.vcf.gz Cleaning Mon Apr 8 23:27:51 CST 2024 perl /ngsproject/bcguo/tools/panpop-main/bin/../scripts/sv2pav.pl --invcf OUTDIR_RUN1/2.thin1.sorted.vcf.gz --outvcf OUTDIR_RUN1/2.thin2.unsorted.vcf.gz --enable_norm_alle 1 --max_len_tomerge 20 --sv_min_dp 40 --threads 16 2>&1 | tee OUTDIR_RUN1/2.thin2.unsorted.vcf.gz.log OUTDIR_RUN1/2.thin1.sorted.vcf.gz not exists! at /ngsproject/bcguo/tools/panpop-main/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN1/2.thin1.sorted.vcf.gz") called at /ngsproject/bcguo/tools/panpop-main/bin/../scripts/sv2pav.pl line 102 Mon Apr 8 23:27:51 CST 2024 bcftools sort OUTDIR_RUN1/2.thin2.unsorted.vcf.gz -o OUTDIR_RUN1/2.thin2.sorted.vcf.gz Writing to /tmp/bcftools.9f2NUo [E::hts_open_format] Failed to open file "OUTDIR_RUN1/2.thin2.unsorted.vcf.gz" : No such file or directory Could not read OUTDIR_RUN1/2.thin2.unsorted.vcf.gz Cleaning Mon Apr 8 23:27:51 CST 2024 ln -s 2.thin2.sorted.vcf.gz OUTDIR_RUN1/3.final.vcf.gz

So I hope you can help me solve this issue, if possible!

Thanks!

Best-wish!

Bo-cheng

starskyzheng commented 5 months ago

Please try this branch: https://github.com/starskyzheng/panpop/tree/starskyzheng-patch-2 I did not tested this branch yet. Your feedback is welcome!

polchan commented 5 months ago

Thanks! I will give it a try later.

polchan commented 4 months ago

Sorry! I tried using the new branch. While the new branch was able to complete the first step of running PART_RUN.perl, I still encountered some issues during the second step:

Now reading ref fasta : Done reading ref fasta read 0 contig / total 0 bp chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. (base) [bocheng@Atlas OUTDIR_RUN2]$ cat 2.thin1.unsorted.vcf.gz.log OUTDIR_RUN2/1.realign0.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/1.realign0.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/merge_similar_allele.pl line 102 (base) [bocheng@Atlas OUTDIR_RUN2]$ cat 2.thin2.unsorted.vcf.gz.log OUTDIR_RUN2/2.thin1.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/2.thin1.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/sv2pav.pl line 102 (base) [bocheng@Atlas OUTDIR_RUN2]$ cd .. (base) [bocheng@Atlas pangenome]$ ll total 136593256 -rwxr-xr-x. 1 bocheng bocheng 0 Apr 13 22:03 MdOrin.fa -rwxr-xr-x. 1 bocheng bocheng 139870967765 Apr 11 10:10 merge_bcftools.vcf.gz drwxrwxr-x. 2 bocheng bocheng 4096 Apr 14 00:23 OUTDIR_RUN1 drwxrwxr-x. 2 bocheng bocheng 194 Apr 14 22:51 OUTDIR_RUN2 -rw-------. 1 bocheng bocheng 511069 Apr 14 00:23 step1.log -rw-rw-r--. 1 bocheng bocheng 5697 Apr 14 22:51 step2.log drwxrwxr-x. 2 bocheng bocheng 10 Apr 14 22:51 tmp (base) [bocheng@Atlas pangenome]$ cat step2.log Sun Apr 14 08:49:55 CST 2024 perl /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl --chr_tolerance --in_vcf OUTDIR_RUN1/3.final.vcf.gz --out_vcf OUTDIR_RUN2/1.realign0.unsorted.vcf.gz --ref_fasta_file MdOrin.fa --threads 25 --ext_bp_max 500 --ext_bp_min 50 --skip_mut_at_same_pos 2 --level 1 --tmpdir tmp 2>&1 | tee OUTDIR_RUN2/1.realign0.unsorted.vcf.gz.log Now reading ref fasta : Done reading ref fasta read 0 contig / total 0 bp chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. chr: CM066474.1 not exists in fasta: MdOrin.fa at /ngsproject/bocheng/tools/panpop/bin/../scripts/realign.pl line 564. Sun Apr 14 22:51:53 CST 2024 bcftools sort OUTDIR_RUN2/1.realign0.unsorted.vcf.gz -o OUTDIR_RUN2/1.realign0.sorted.vcf.gz --temp-dir tmp/tmp_bcftools. Writing to tmp/tmp_bcftools.a4ZuT7 [E::bcf_hdr_read] Input is not detected as bcf or vcf format Could not read VCF/BCF headers from OUTDIR_RUN2/1.realign0.unsorted.vcf.gz Cleaning Sun Apr 14 22:51:53 CST 2024 perl /ngsproject/bocheng/tools/panpop/bin/../scripts/merge_similar_allele.pl --type 3 --invcf OUTDIR_RUN2/1.realign0.sorted.vcf.gz --outvcf OUTDIR_RUN2/2.thin1.unsorted.vcf.gz --sv2pav_merge_identity_threshold 0.6 --sv2pav_merge_diff_threshold 40 --threads 25 --tmpdir tmp 2>&1 | tee OUTDIR_RUN2/2.thin1.unsorted.vcf.gz.log OUTDIR_RUN2/1.realign0.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/1.realign0.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/merge_similar_allele.pl line 102 Sun Apr 14 22:51:54 CST 2024 bcftools sort OUTDIR_RUN2/2.thin1.unsorted.vcf.gz -o OUTDIR_RUN2/2.thin1.sorted.vcf.gz --temp-dir tmp/tmp_bcftools. Writing to tmp/tmp_bcftools.L6KYyr [E::hts_open_format] Failed to open file "OUTDIR_RUN2/2.thin1.unsorted.vcf.gz" : No such file or directory Could not read OUTDIR_RUN2/2.thin1.unsorted.vcf.gz Cleaning Sun Apr 14 22:51:54 CST 2024 perl /ngsproject/bocheng/tools/panpop/bin/../scripts/sv2pav.pl --invcf OUTDIR_RUN2/2.thin1.sorted.vcf.gz --outvcf OUTDIR_RUN2/2.thin2.unsorted.vcf.gz --enable_norm_alle 1 --max_len_tomerge 20 --sv_min_dp 40 --threads 25 2>&1 | tee OUTDIR_RUN2/2.thin2.unsorted.vcf.gz.log OUTDIR_RUN2/2.thin1.sorted.vcf.gz not exists! at /ngsproject/bocheng/tools/panpop/scripts/../lib/zzIO.pm line 57. zzIO::open_in_fh("OUTDIR_RUN2/2.thin1.sorted.vcf.gz") called at /ngsproject/bocheng/tools/panpop/bin/../scripts/sv2pav.pl line 102 Sun Apr 14 22:51:54 CST 2024 bcftools sort OUTDIR_RUN2/2.thin2.unsorted.vcf.gz -o OUTDIR_RUN2/2.thin2.sorted.vcf.gz --temp-dir tmp/tmp_bcftools. Writing to tmp/tmp_bcftools.1gUY4J [E::hts_open_format] Failed to open file "OUTDIR_RUN2/2.thin2.unsorted.vcf.gz" : No such file or directory Could not read OUTDIR_RUN2/2.thin2.unsorted.vcf.gz Cleaning Sun Apr 14 22:51:54 CST 2024 ln -s 2.thin2.sorted.vcf.gz OUTDIR_RUN2/3.final.vcf.gz

I hope my feedback is helpful to you, and I look forward to seeing you release a new workflow.

starskyzheng commented 4 months ago

What were the CHR column of the vcf file OUTDIR_RUN1/3.final.vcf.gz?

polchan commented 4 months ago

hello! this is the first few lines of OUTDIR_RUN1/3.final.vcf.gz
header.vcf.txt

starskyzheng commented 4 months ago

VCF file looks OK. Can you Please also provides ref.fa file: MdOrin.fa

polchan commented 4 months ago

Sorry! I think I've found the issue: MdOrin.fa became 0 bytes. Strangely, I didn't encounter this problem during the first step of running PART_run.pl. I'll rerun it and inform you of the outcome afterward.

1713259607370
polchan commented 4 months ago

Good news! It seems that I have successfully run all the processes of PART_run.pl. However, I still have a question. The merge_bcftools.vcf.gz file I initially inputted was 130 Gb, but the final output file 3.final.vcf.gz is only 1.9 Gb. Is this size reduction reasonable? In what form does the remaining 120+ Gb of data exist?

starskyzheng commented 4 months ago

I don't know. You must check by yourself. If you have any result please let me know.

polchan commented 4 months ago

Alright! I have reviewed the previous vcf files and the new vcf files after merging. The former vcf files had over 20,000,000 variations, while the merged vcf file has more than 9,400,000 variations. I believe the main reason for the reduction is not the decrease in the number of variations, but rather that the descriptions of variations within individual samples were not retained during the merging process. Of course, these descriptions do not affect subsequent analyses.

this is a smaple of previous vcf: previous.vcf.txt

this is a smaple of new vcf: new.vcf.txt

Thank you very much for the excellent software you have developed and the help you have provided recently!