tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
742 stars 187 forks source link

lifted anchors: format reading issue? #630

Closed alpapan closed 6 months ago

alpapan commented 7 months ago

Hello Could yopu please provide pointers on how to trouble shoot this

I used the latest code from pip rather than bioconda and what used to work now doesn't. I think it may be a formating issue.

My approach is the following

#format BED
python -m jcvi.formats.gff bed --type=mRNA hifiasm_OGSJun23.gff3.clean.gff3 -o Murcott_Pure.bed
python -m jcvi.formats.gff bed --type=mRNA phoenix_OGSJun2023.gff3.clean.gff3 -o Murcott_Phoenix.bed
#get orthologues
python -m jcvi.compara.catalog ortholog --skipempty --dbtype=nucl --full --cpus=20 --nochpf --nostdpf --no_strip_names --genomenames="Murcott Phoenix_Murcott Pure" Murcott_Phoenix Murcott_Pure
# create large scale simple synteny blocks
python -m jcvi.compara.synteny screen --simple Murcott_Phoenix.Murcott_Pure.anchors Murcott_Phoenix.Murcott_Pure.anchors.new

Error (i'm showing just the error part at 1x1 anchors)

$ python -m jcvi.compara.catalog ortholog --skipempty --dbtype=nucl --full --cpus=20 --nochpf --nostdpf --no_strip_names --genomenames="Murcott Phoenix_Murcott Pure" Murcott_Phoenix Murcott_Pure
[02/20/24 15:13:31] DEBUG    File `Murcott_Phoenix.Murcott_Pure.last` found. Computation skipped.                                                                                                                                                                                        base.py:1384
                    DEBUG    File `Murcott_Phoenix.Murcott_Pure.last.filtered` found. Computation skipped.                                                                                                                                                                               base.py:1384
                    DEBUG    File `Murcott_Phoenix.Murcott_Pure.anchors` found. Computation skipped.                                                                                                                                                                                     base.py:1384
                    DEBUG    File `Murcott_Phoenix.Murcott_Pure.1x1.anchors` found. Computation skipped.                                                                                                                                                                                 base.py:1384
                    DEBUG    File `Murcott_Phoenix.Murcott_Pure.1x1.lifted.anchors` found. Computation skipped.                                                                                                                                                                          base.py:1384
                    DEBUG    Load file `Murcott_Phoenix.bed`                                                                                                                                                                                                                               base.py:33
                    DEBUG    Load file `Murcott_Phoenix.Murcott_Pure.1x1.lifted.anchors`                                                                                                                                                                                                   base.py:33
Chain started: 19 blocks
Chain 0: score=38418 done!
[15:13:31] DEBUG    MCscan blocks written to `Murcott_Phoenix.Murcott_Pure.1x1.blocks`.                                                                                                                                                                                               synteny.py:1557
                    DEBUG    Load file `Murcott_Pure.bed`                                                                                                                                                                                                                                  base.py:33
[02/20/24 15:13:32] DEBUG    Load file `Murcott_Phoenix.Murcott_Pure.1x1.lifted.anchors`                                                                                                                                                                                                   base.py:33
Traceback (most recent call last):
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/catalog.py", line 978, in <module>
    main()
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/catalog.py", line 78, in main
    p.dispatch(globals())
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/apps/base.py", line 140, in dispatch
    globals[action](sys.argv[2:])
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/catalog.py", line 788, in ortholog
    mcscan([bbed, lifted_anchors, "--iter=1", "-o", qblocks])
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/synteny.py", line 1507, in mcscan
    ranges, block_pairs = ac.make_ranges(order, clip=clip)
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/base.py", line 38, in make_ranges
    r = make_range(q, s, t, i, order, block_pairs, clip=clip)
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/base.py", line 132, in make_range
    q = [order[x][0] for x in q]
  File "/apps/packages/miniconda/3.1/envs/jcvi/lib/python3.9/site-packages/jcvi/compara/base.py", line 132, in <listcomp>
    q = [order[x][0] for x in q]
KeyError: 'evm.model.mp_v1_ptg000017l_pilon.1'

Bed is formatted like so

mp_v1_ptg000001l_pilon  155416  155744  evm.model.mp_v1_ptg000001l_pilon.1      0       +

CDS is like so

>evm.model.mp_v1_ptg000001l_pilon.1 type:CDS  gene:evm.TU.mp_v1_ptg000001l_pilon.1
ATGCTTCCTGTGCTTTATCACGGGCAGTTTGCATTGCTACAAGGTATGGTGCTGTCCGTAGACAATTCGGTTCAGAAAAT

blocks like so

head Murcott_Phoenix.Murcott_Pure.1x1.blocks
evm.model.mp_v1_ptg000001l_pilon.1      evm.model.ptg000001l_pilon.1.3.648366e3
evm.model.mp_v1_ptg000001l_pilon.2      evm.model.ptg000001l_pilon.3
evm.model.mp_v1_ptg000001l_pilon.3      evm.model.ptg000001l_pilon.4
evm.model.mp_v1_ptg000001l_pilon.4      evm.model.ptg000001l_pilon.5
evm.model.mp_v1_ptg000001l_pilon.5      evm.model.ptg000001l_pilon.6
evm.model.mp_v1_ptg000001l_pilon.6      evm.model.ptg000001l_pilon.7
evm.model.mp_v1_ptg000001l_pilon.7      evm.model.ptg000001l_pilon.8
evm.model.mp_v1_ptg000001l_pilon.8      evm.model.ptg000001l_pilon.9
evm.model.mp_v1_ptg000001l_pilon.9      evm.model.ptg000001l_pilon.10
evm.model.mp_v1_ptg000001l_pilon.10     evm.model.ptg000001l_pilon.11

1x1.lifted.anchors like so

###
evm.model.mp_v1_ptg000001l_pilon.1      evm.model.ptg000001l_pilon.1.2.648366e3 158
evm.model.mp_v1_ptg000001l_pilon.2      evm.model.ptg000001l_pilon.3    780
evm.model.mp_v1_ptg000001l_pilon.3      evm.model.ptg000001l_pilon.4    1410
evm.model.mp_v1_ptg000001l_pilon.4      evm.model.ptg000001l_pilon.5    1600
evm.model.mp_v1_ptg000001l_pilon.5      evm.model.ptg000001l_pilon.6    803
evm.model.mp_v1_ptg000001l_pilon.6      evm.model.ptg000001l_pilon.7    1200
evm.model.mp_v1_ptg000001l_pilon.7      evm.model.ptg000001l_pilon.8    1900
evm.model.mp_v1_ptg000001l_pilon.8      evm.model.ptg000001l_pilon.9    1960
evm.model.mp_v1_ptg000001l_pilon.9      evm.model.ptg000001l_pilon.10   1140

and Murcott_Phoenix.Murcott_Pure.1x1

0,2,3,4,5,8,9,10,16,22,23,24,25,27,30,31,34,37,40

Using your tutorial, I don't normally use the 1x1 so not sure if i should do something / what to do

It is working fine on the bioconda version....

tanghaibao commented 7 months ago

@alpapan

We see this error when names are not matching. I saw that error comes from a name like evm.model.mp_v1_ptg000017l_pilon.1, did you check this is indeed in the Bed file? I assume you checked. Sometimes we'd also need to try turning on or off --no-strip-names to get names to match. But I saw you have that option on already so quite puzzled.