starskyzheng / panpop

Application of pan-genome for population
MIT License
94 stars 8 forks source link

Problem in running PART stand alone #16

Closed Riccardo1274 closed 6 months ago

Riccardo1274 commented 8 months ago

Dear Zeyu Zheng, I hope all is well. I am trying to run PART standalone with this command line: perl ../panpop/bin/PART_run.pl --in_vcf ../vcf_jasmine_nonlinear_dist_id/file_x_panpop_sorted.vcf.gz -o first_merge_dir -r ../ncbi/reference_hr.fasta -t 30 --tmpdir TMPDIR > first_merge.log The input .vcf was created in two steps:

  1. bcftools concat between same sample but different variant callers
  2. after sorting and indexing, I used bcftools merge to merge all concat outputs, combining all different samples When I pass this file to PART, it seems to work at first. But after a few minutes it stops, without saying anything in the logs. It seems to stop at a different point each time. Take my last run as an example. This comes from the first_merge.log file Warn! aln software(muscle) exit status(2)! redo! No. 1 Warn! aln software(muscle) exit status(2)! redo! No. 1 Warn! aln software(muscle) exit status(2)! redo! No. 1 Warn! aln software(muscle) exit status(2)! redo! No. 1 Argument "CTG--------CTGATGCTGCAGCTGACGGACAAGGGCTCCGTGCTCTACCAGCTG..." isn't numeric in subroutine entry at /data/rossir/panpop/scripts/../lib/realign_alts.pm line 856. Argument "CTG----TGATGCTGCAGCTGACGGACAAGGGCTCCGTGCTCTACCAGCTG..." isn't numeric in subroutine entry at /data/rossir/panpop/scripts/../lib/realign_alts.pm line 856. Argument "CTG--------CTGATGCTGCAGCTGACGGACAAGGGCTCCGTGCTCTACCAGCTG..." isn't numeric in sort at /data/rossir/panpop/scripts/../lib/realign_alts.pm line 872. Argument "CTG----TGATGCTGCAGCTGACGGACAAGGGCTCCGTGCTCTACCAGCTG..." isn't numeric in sort at /data/rossir/panpop/scripts/../lib/realign_alts.pm line 872. Warn! no alt[0] from aln software(famsaP)! redo! No. 1 Warn! aln software(mafft) exit status(1)! redo! No. 1 Warn! aln software(muscle) exit status(2)! redo! No. 1

This is the output directory content: 1.realign0.unsorted.vcf.gz 1.realign0.unsorted.vcf.gz.log (sometimes there are more files)

Do you have any suggestions? Running the programme with smaller vcfs seems to work fine, I get a file called 3.final.vcf.gz. Am I doing something wrong? Your tool looks much better than other merging tools, I really want to make it work for me :) Thanks in advance Kind regards, Riccardo Rossi

starskyzheng commented 8 months ago

can you provide this vcf file?

Riccardo1274 commented 8 months ago

Many thanks. Here's attached a subsample and reduced version due to the 25mb limit file_x_panpop_subsample_reduced.vcf.gz

starskyzheng commented 8 months ago

Thanks for feedback. I added validation of alignment software results, which should solve this bug. Please try again! (Commit abad15d)

Riccardo1274 commented 8 months ago

Hi Zeyu Zheng, I'm sorry to tell you that I still have problems. They changed everytime I tried to run the program, I used the same command line as above with just different threads number

  1. I obtained the 3.final vcf file, but it was really short (30 variants beginning from more than 5 million. It stopped after a few positions for the first chromosome)
  2. Same output as before the commit
  3. The 3.final vcf was there, but with bcftools I had an error of "file not found" while trying to open it. Can I try something different? Thanks for your help Best regards, Riccardo Rossi
github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.