tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
748 stars 186 forks source link

Synteny depth pattern #235

Closed awesomedeer closed 4 years ago

awesomedeer commented 4 years ago

Hi Haibao When I run python -m jcvi.compara.synteny depth --histogram ben.tha.anchors I got this: Genome ben depths: Depth 0: 1,186 of 25,904 (4.6%) Depth 1: 21,423 of 25,904 (82.7%) Depth 2: 3,122 of 25,904 (12.1%) Depth 3: 153 of 25,904 (0.6%) Depth 4: 20 of 25,904 (0.1%) Genome tha depths: Depth 0: 1,259 of 28,940 (4.4%) Depth 1: 25,682 of 28,940 (88.7%) Depth 2: 1,933 of 28,940 (6.7%) Depth 3: 43 of 28,940 (0.1%) Depth 4: 4 of 28,940 (0.0%) Depth 5: 19 of 28,940 (0.1%) ben vs tha syntenic depths 1:2 pattern Why it is a 1:2 pattern? I think it's a 1:1 pattern.

Thanks a lot. Song

tanghaibao commented 4 years ago

@Song

The pattern reporting is based on a simple heuristic that will report up to 90% coverage. This is a very simple rule. For example in your case, if you only count up to "Depth 1", ben has only coverage of 4.6% + 82.7% < 90%, so based on this rule, we think ben aligns up to 2 tha regions.

This is only a guideline and would not accurate in some cases. Always check the dot plot to confirm the ratio.

Haibao

awesomedeer commented 4 years ago

I see. Thanks for your reply! Plus one quick question: what's the deafult mimimum number of one syntenic block? I am trying to write legend for synteny figures.

Song

tanghaibao commented 4 years ago

@awesomedeer

Default min number is 4 and distance between anchors is 20.

⇒  python -m jcvi.compara.synteny scan
Usage:
    synteny.py scan blastfile anchor_file [options]

    pull out syntenic anchors from blastfile based on single-linkage algorithm

Options:
  -h, --help            Show this help message and exit
  -n N, --min_size=N    Minimum number of anchors in a cluster [default: 4]
  --intrabound=INTRABOUND
                        Lower bound of intra-chromosomal blocks (only for self
                        comparison) [default: 300]
  --liftover=LIFTOVER   Scan BLAST file to find extra anchors [default: none]
  --no_strip_names      Do not strip alternative splicing (e.g. At5g06540.1 ->
                        At5g06540)
  --qbed=QBED           Path to qbed [default: none]
  --sbed=SBED           Path to sbed [default: none]
  --dist=DIST           Extent of flanking regions to search [default: 20]
awesomedeer commented 4 years ago

Thanks!

cdanmaigona commented 2 years ago

Hello Haibao,

A follow-up question on this. When I run python -m jcvi.compara.synteny depth --histogram F1.F4.anchors --depthfile=F1.F4.depth I get this

Genome F1 depths: Depth 0: 704 of 16,800 (4.2%) Depth 1: 15,612 of 16,800 (92.9%) Depth 2: 173 of 16,800 (1.0%) Depth 3: 138 of 16,800 (0.8%) Depth 4: 106 of 16,800 (0.6%) Depth 5: 57 of 16,800 (0.3%) Depth 6: 10 of 16,800 (0.1%) Genome F4 depths: Depth 0: 2,998 of 19,588 (15.3%) Depth 1: 16,130 of 19,588 (82.3%) Depth 2: 340 of 19,588 (1.7%) Depth 3: 120 of 19,588 (0.6%) [08:41:07 PM] DEBUG Depth written to Race1.Race4. synteny.py:1773 Race1 vs Race4 syntenic depths 1:1 pattern

From the explanation on your wiki, there are up to 6 F4 blocks per F1 gene

however, when I run this python -m jcvi.compara.synteny stats F1.F4.i6.blocks to get the statistics on my blocks and actual duplicate genes, the numbers do not correlate.

Count 0: 1,450 of 16,800 (8.6%) Count 1: 15,052 of 16,800 (89.6%) Count 2: 87 of 16,800 (0.5%) Count 3: 83 of 16,800 (0.5%) Count 4: 48 of 16,800 (0.3%) Count 5: 80 of 16,800 (0.5%)

Total lines with matches: 15,350 of 16,800 (91.4%) Count 1: 15,052 of 15,350 (98.1%) Count 2: 87 of 15,350 (0.6%) Count 3: 83 of 15,350 (0.5%) Count 4: 48 of 15,350 (0.3%) Count 5: 80 of 15,350 (0.5%)

The numbers do not correspond to what I'm getting with the depth command. I can only see a maximum of 5 duplicates when the depth analysis shows up to 6. Please help me understand what I'm missing.

Thank you!!