pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
368 stars 41 forks source link

How to handle plasmid sequences for bacterial assemblies #406

Open UpalabdhaD opened 1 month ago

UpalabdhaD commented 1 month ago

Dear pggb team, Thanks for the awesome tool. Currently, I am interested in using pggb in bacterial species, that conatins megaplasmids. I am to certain extent confused if I want to remove the plasmids and start the pggb with only the chromosomes, or plasmids can be included for the same. I came across this page https://pggb.readthedocs.io/en/latest/rst/tutorials/escherichia_coli.html, however it doesnot have much clues for the same. I want to hear from the team any strategies to determine whether to include the plasmids or not.

My idea is if i include the plasmid sequences it will inflate the number of paths in the pangenome. So to say, if i start with 100 assemblies the number of paths in pggb output would be 100+some_additional_number depending on the distance of due to plasmid sequences! Am i correct in that sense? If my intention is identify and visualize the SV, insertions, inversions should i include the plasmid sequences, which could be around 20% of the genome size?

Thanks in advance!