pangenome / pggb

the pangenome graph builder
https://doi.org/10.1101/2023.04.05.535718
MIT License
346 stars 37 forks source link

the effect of `n_mappings` #382

Open subwaystation opened 4 months ago

subwaystation commented 4 months ago

Hi, I am wondering if or when to set n_mappings to the number of haplotypes? I feel like the current default n_mappings 1 seems to underalign a little bit? In the following two example pangenome graphs generated from the typical 8 yeast genomes:

# n_mappings == 8

---
length: 14091153
nodes: 630280
edges: 853558
paths: 136
steps: 3273155
num_weakly_connected_components: 2
weakly_connected_components: 
  - component:
      id: 0
      nodes: 621675
      is_acyclic: 'no'
  - component:
      id: 1
      nodes: 8605
      is_acyclic: 'no'
num_nodes_self_loops:
  total: 12
  unique: 12
A: 4321050
C: 2658659
G: 2686086
N: 121499
T: 4303859
file_size_in_bytes: 129734193
# n_mappings == 1

---
length: 14745611
nodes: 613254
edges: 829212
paths: 136
steps: 3105406
num_weakly_connected_components: 7
weakly_connected_components: 
  - component:
      id: 0
      nodes: 36909
      is_acyclic: 'no'
  - component:
      id: 1
      nodes: 20009
      is_acyclic: 'no'
  - component:
      id: 2
      nodes: 402691
      is_acyclic: 'no'
  - component:
      id: 3
      nodes: 49762
      is_acyclic: 'no'
  - component:
      id: 4
      nodes: 42041
      is_acyclic: 'no'
  - component:
      id: 5
      nodes: 54086
      is_acyclic: 'no'
  - component:
      id: 6
      nodes: 7756
      is_acyclic: 'no'
num_nodes_self_loops:
  total: 1
  unique: 1
A: 4524408
C: 2792795
G: 2805204
N: 121499
T: 4501705
file_size_in_bytes: 126904847

I feel like the default might be missing some mappings and this leads to more graphical components. What is your recommendation here @AndreaGuarracino @ekg ? Thanks!