starskyzheng / panpop

Application of pan-genome for population
MIT License
93 stars 8 forks source link

About PART stand alone #25

Closed JiadongZHONG closed 6 months ago

JiadongZHONG commented 7 months ago

Hi Zeyu,

Thank you very much for developing such a convenient tool!

I have encountered some problems when using PART stand alone. Here are my steps:

  1. I have 2000 NGS samples that have been mapped to the pangenome reference using vg giraffe, and have obtained 2000 vcf files using vg call.
  2. After splitting all the files into 22 parts based on chromosomes, I merged the files using bcftools merge -m none, resulting in 22 merged vcf files.
  3. Then I used only chr22_merged.vcf as a test, using perl PART_run.pl -i chr22_merged.vcf -o OUTPUT_PATH -r CHM13v2.fasta, with 32 threads and 450G.

There is currently no error message, but this task has been running for 9 days, and I am still stuck at the step of generating 2.thin1.unsorted.vcf.gz. This file was created on day 2, but the content was only updated on day 6. The length of chromosome 22 is 51,324,926, the current position recorded in this file is only 3,148,301. Additionally, no temporary files have been observed. I wonder if I am doing something wrong.

There are two other minor issues.

  1. Because I used the parameters vg call -a -A (genotype every snarl and all snarls) when calling variants, will this have an impact on subsequent SVs merges?
  2. Can your method finally phase SVs?
starskyzheng commented 6 months ago

The PART_run.pl script facilitates multi-threading through the use of the -t parameter. Have you included this parameter in your command?

Additionally, the status of the Perl script can be monitored by employing system monitoring tools such as top or htop. These will allow you to verify whether the script is actively running and utilizing CPU resources.

For the generation of merged VCF files, I recommend using the PanPop pipeline, as it offers a user-friendly interface and has the potential to expedite the process.

Minor Issue 1: We have not yet experimented with this parameter. However, it is anticipated that utilizing it might enhance performance, which might yielding better results by reduced rate of missing data.

Minor Issue 2: Sorry, not supported.

JiadongZHONG commented 6 months ago

Thank you so much! It is very helpful.