wjian8 / psvcp_v1.01

Pan-genome Construction and Population Structure Variation Calling pipeline
GNU General Public License v3.0
33 stars 5 forks source link

Construct pan-genome by several (more than 2) genome #17

Open CSU-KangHu opened 1 week ago

CSU-KangHu commented 1 week ago

Thank you for developing such an excellent tool. The paper “Duck pan-genome reveals two transposon insertions caused bodyweight enlarging and white plumage phenotype formation during evolution” mentions using Psvcp and PPsPCP for constructing linear pan-genomes.

We have developed a TE detection tool that can perform TE detection on individual genomes. However, we are considering whether we could first construct a linear pan-genome across multiple genomes and then perform TE detection to reduce redundant computations. Our requirement is not to build an extremely precise pan-genome, but to ensure the linear pan-genome comprehensively includes all unique insertions from the genomes. Thus, I need to confirm two things with you:

  1. Does the pan-genome constructed by Psvcp include all unique insertions from the genomes? If so, I would not need to use PPsPCP for pan-genome construction.
  2. When I do not provide an annotation file, although Psvcp reports errors, it ultimately generates pan.fa, pan.pav.gff, and pan.pav.sorted.gff. I would like to know if this indicates that the linear pan-genome has been correctly generated.

Thank you again for your efforts.

wjian8 commented 1 week ago
  1. We used the Nucmer tool to perform alignment between two genomes (A and B), then identified the segments present in B but absent in A, and inserted them into the A genome. We have also tried other genome comparison software, and the results were somewhat different. Therefore, we cannot claim that Nucmer identified all the divergent sequences and inserted them into the linear pangenome.
  2. Yes, providing the annotated GFF file only updates the annotation based on PAV, and does not affect the upgrade of the FASTA genome file. The linear pangenome was generated correctly.