pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
369 stars 40 forks source link

How to understand the long range links? #213

Closed Yutang-ETH closed 1 year ago

Yutang-ETH commented 2 years ago

Hi,

What does the long range link mean? I saw some very long range links connecting the two ends of the chromosome, how should we interpret these links? If those links are caused by repeats, how can I remove them? I want to remove the link but not the node. Thank you very much in advance.

image

Best wishes, Yutang

Yutang-ETH commented 2 years ago

Sorry, maybe make it more clear, this graph was generated by PGGB with default parameters with n=2

baozg commented 2 years ago

For links of two ends of chromosome, it may related to the telomere repeat (can check for TTTAGGG repeat). If you want to remove these link, I'd suggest you mask these telomere repeat first.

Yutang-ETH commented 2 years ago

Hi Zhigui,

Thank you very much for your input. I see, it looks like these long-range links do have their biological sense, then I prefer not to remove them.

Best wishes, Yutang

baozg commented 2 years ago

I just suggest removing some long range links caused by telomere repeats since it's easy to mask. But for other long-range links, it's very common in plants especially in high het% plant genome. It may come from the genome rearrangements, TE, mapping noise. You should look the dotplot first then tune parameters.

Yutang-ETH commented 2 years ago

Thanks Zhigui. Yes, I totally agree that the long-range links could be caused by many different reasons, and I did check the dotplot before tuning parameters for PGGB. On top of that, based on what I learnt these days by playing with PGGB, I think whole genome alignment has a very large impact on the pangenome graph and it is hard to tell the quality of the pangenome graph only with the 1D visulizaiton, but I do like your suggestion to mask telomere repeats.