tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
748 stars 186 forks source link

Using our own genome for MCscan #136

Closed hyl317 closed 3 years ago

hyl317 commented 5 years ago

Hi, I wanna use MCscan on my own genome assembly. I produced .bed file as in the tutorial from the .gff file. When running the jcvi.compara.catalog, I encountered the following error: yh362@mocha:~/work/2019_Summer/MCscan$ python -m jcvi.compara.catalog ortholog t omato Ntab 17:59:04 [driver] Generating grammar tables from /usr/lib/python2.7/lib2to3/Gram mar.txt 17:59:04 [driver] Generating grammar tables from /usr/lib/python2.7/lib2to3/Patt ernGrammar.txt 17:59:04 [synteny] Assuming --qbed=tomato.bed --sbed=Ntab.bed 17:59:04 [base] Load file tomato.bed 17:59:05 [base] Load file Ntab.bed 17:59:07 [blastfilter] Load BLAST file tomato.Ntab.last (total 801048 lines) 17:59:07 [base] Load file tomato.Ntab.last 17:59:12 [blastfilter] Solyc02g014190.4 not in tomato.bed 17:59:12 [blastfilter] Solyc11g045160.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g045160.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g045160.3 not in tomato.bed 17:59:12 [blastfilter] Solyc03g093480.3 not in tomato.bed 17:59:12 [blastfilter] Solyc12g056940.2 not in tomato.bed 17:59:12 [blastfilter] Solyc03g111570.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g065700.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g073750.4 not in tomato.bed 17:59:12 [blastfilter] Solyc12g006480.2 not in tomato.bed 17:59:12 [blastfilter] Solyc02g070810.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g108160.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g081430.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g095900.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g091460.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g065700.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g092100.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g082510.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g090780.3 not in tomato.bed 17:59:12 [blastfilter] Solyc07g052940.4 not in tomato.bed 17:59:12 [blastfilter] Solyc02g083900.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g017460.3 not in tomato.bed 17:59:12 [blastfilter] Solyc02g089370.2 not in tomato.bed 17:59:12 [blastfilter] Solyc01g006640.1 not in tomato.bed 17:59:12 [blastfilter] Solyc02g070260.4 not in tomato.bed 17:59:12 [blastfilter] Solyc02g070260.4 not in tomato.bed 17:59:12 [blastfilter] Solyc12g100360.1 not in tomato.bed 17:59:12 [blastfilter] Solyc08g081000.3 not in tomato.bed 17:59:12 [blastfilter] Solyc04g076620.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g106770.4 not in tomato.bed 17:59:12 [blastfilter] Solyc03g046450.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g094620.3 not in tomato.bed 17:59:12 [blastfilter] Solyc08g006420.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g104040.4 not in tomato.bed 17:59:12 [blastfilter] Solyc03g051900.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g008120.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g109080.3 not in tomato.bed 17:59:12 [blastfilter] Solyc03g062650.3 not in tomato.bed 17:59:12 [blastfilter] Solyc04g039950.4 not in tomato.bed 17:59:12 [blastfilter] Solyc11g012770.2 not in tomato.bed 17:59:12 [blastfilter] Solyc11g012770.2 not in tomato.bed 17:59:12 [blastfilter] Solyc11g012770.2 not in tomato.bed 17:59:12 [blastfilter] Solyc06g082100.4 not in tomato.bed 17:59:12 [blastfilter] Solyc02g089260.4 not in tomato.bed 17:59:12 [blastfilter] Solyc11g012770.2 not in tomato.bed 17:59:12 [blastfilter] Solyc04g076540.4 not in tomato.bed 17:59:12 [blastfilter] Solyc01g079510.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g104490.3 not in tomato.bed 17:59:12 [blastfilter] Solyc03g097670.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g064440.4 not in tomato.bed 17:59:12 [blastfilter] Solyc06g051310.3 not in tomato.bed 17:59:12 [blastfilter] Solyc07g005030.4 not in tomato.bed 17:59:12 [blastfilter] Solyc07g005030.4 not in tomato.bed 17:59:12 [blastfilter] Solyc02g090760.4 not in tomato.bed 17:59:12 [blastfilter] Solyc07g006820.4 not in tomato.bed 17:59:12 [blastfilter] Solyc04g051570.4 not in tomato.bed 17:59:12 [blastfilter] Solyc11g030600.3 not in tomato.bed 17:59:12 [blastfilter] Solyc07g005030.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g060080.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g060080.4 not in tomato.bed 17:59:12 [blastfilter] Solyc03g120980.3 not in tomato.bed 17:59:12 [blastfilter] Solyc04g011380.4 not in tomato.bed 17:59:12 [blastfilter] Solyc02g068720.3 not in tomato.bed 17:59:12 [blastfilter] Solyc08g067620.2 not in tomato.bed 17:59:12 [blastfilter] Solyc01g101070.3 not in tomato.bed 17:59:12 [blastfilter] Solyc01g103690.4 not in tomato.bed 17:59:12 [blastfilter] Solyc06g069480.3 not in tomato.bed 17:59:12 [blastfilter] Solyc02g067010.3 not in tomato.bed 17:59:12 [blastfilter] Solyc02g067010.3 not in tomato.bed 17:59:12 [blastfilter] Solyc08g067610.3 not in tomato.bed 17:59:12 [blastfilter] Solyc07g017510.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g063030.4 not in tomato.bed 17:59:12 [blastfilter] Solyc08g013940.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g007280.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g007290.1 not in tomato.bed 17:59:12 [blastfilter] Solyc01g097320.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g092240.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g014780.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g072730.2 not in tomato.bed 17:59:12 [blastfilter] Solyc06g065670.4 not in tomato.bed 17:59:12 [blastfilter] Solyc11g072730.2 not in tomato.bed 17:59:12 [blastfilter] Solyc08g077430.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g014780.3 not in tomato.bed 17:59:12 [blastfilter] Solyc02g076720.3 not in tomato.bed 17:59:12 [blastfilter] Solyc09g007310.3 not in tomato.bed 17:59:12 [blastfilter] Solyc11g013280.1 not in tomato.bed 17:59:12 [blastfilter] Solyc11g013280.1 not in tomato.bed 17:59:12 [blastfilter] Solyc11g062310.2 not in tomato.bed 17:59:12 [blastfilter] Solyc03g082680.4 not in tomato.bed 17:59:12 [blastfilter] Solyc08g067620.2 not in tomato.bed 17:59:12 [blastfilter] Solyc05g053610.2 not in tomato.bed 17:59:12 [blastfilter] Solyc08g081890.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g091660.3 not in tomato.bed 17:59:12 [blastfilter] Solyc10g080630.2 not in tomato.bed 17:59:12 [blastfilter] Solyc04g055120.3 not in tomato.bed 17:59:12 [blastfilter] Solyc03g113370.3 not in tomato.bed 17:59:12 [blastfilter] Solyc05g052510.4 not in tomato.bed 17:59:12 [blastfilter] Solyc09g091660.3 not in tomato.bed 17:59:12 [blastfilter] Solyc02g065500.4 not in tomato.bed 17:59:12 [blastfilter] Solyc11g065920.2 not in tomato.bed 17:59:12 [blastfilter] too many warnings.. suppressed 17:59:19 [blastfilter] running the cscore filter (cscore>=0.70) .. 17:59:19 [blastfilter] after filter (0->0) .. 17:59:19 [blastfilter] running the local dups filter (tandem_Nmax=10) .. 17:59:19 [blastfilter] after filter (0->0) .. 17:59:19 [synteny] Assuming --qbed=tomato.bed --sbed=Ntab.bed 17:59:19 [base] Load file tomato.bed 17:59:20 [base] Load file Ntab.bed 17:59:23 [base] Load file tomato.Ntab.last.filtered 17:59:23 [synteny] A total of 0 BLAST imported from tomato.Ntab.last.filtered. 17:59:23 [synteny] Chaining distance = 20 17:59:23 [base] Load file tomato.Ntab.anchors 17:59:23 [synteny] A total of 0 anchor was found. Aborted. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/data/home/yh362/.local/lib/python2.7/site-packages/jcvi/compara/catalog.py", line 841, in main() File "/data/home/yh362/.local/lib/python2.7/site-packages/jcvi/compara/catalog.py", line 73, in main p.dispatch(globals()) File "/home/yh362/.local/lib/python2.7/site-packages/jcvi/apps/base.py", line 96, in dispatch globalsaction File "/data/home/yh362/.local/lib/python2.7/site-packages/jcvi/compara/catalog.py", line 664, in ortholog "--liftover={0}".format(last)]) File "/home/yh362/.local/lib/python2.7/site-packages/jcvi/compara/synteny.py", line 1483, in scan summary([anchor_file]) File "/home/yh362/.local/lib/python2.7/site-packages/jcvi/compara/synteny.py", line 1050, in summary raise ValueError("A total of 0 anchor was found. Aborted.") ValueError: A total of 0 anchor was found. Aborted. yh362@mocha:~/work/2019_Summer/MCscan$ grep 'Solyc02g014190.4' tomato.bed SL4.0ch02 14232022 14252144 Solyc02g014190.4.1 0

As shown in the last line, I used grep and it seems like these Names are indeed in the tomato.bed. What' the problem here? I am guessing my .cds and .bed don't conform to what MCscan is assuming... so what's the file format required by MCscan if that's the problem?

aboyher commented 5 years ago

I just went through this. It seems something in this tool, maybe some tool that extracts feature names from the ..last file, strips the "." from the end of the name and looks for the basename in your bed file. I fixed it by sed replacing "." in the bed, cds, and last files, reran python -m jcvi.compara.catalog ortholog <genome1> <genome2> and everything looks good.

Seems like this would be an easy fix.

Hope this helps you @hyl317

tanghaibao commented 5 years ago

@aboyher

Note that the name stripping can be turned off by this option to command jcvi.compara.catalog ortholog.

  --no_strip_names      Do not strip alternative splicing (e.g. At5g06540.1 ->
                        At5g06540) [default: False]
dcopetti commented 4 years ago

Hi, I had the same issue, and the --no_strip_names did not help: python -m jcvi.compara.catalog ortholog sa bd --no_strip_names &>stdout gave

ESC[0;33m17:21:05 [synteny]ESC[0mESC[0;35m Assuming --qbed=sa.bed --sbed=bd.bedESC[0m
ESC[0;33m17:21:05 [base]ESC[0mESC[0;35m Load file `sa.bed`ESC[0m
ESC[0;33m17:21:05 [base]ESC[0mESC[0;35m Load file `bd.bed`ESC[0m
ESC[0;33m17:21:05 [base]ESC[0mESC[0;35m Load file `sa.bd.last.filtered`ESC[0m
ESC[0;33m17:21:05 [synteny]ESC[0mESC[0;35m A total of 76 BLAST imported from `sa.bd.last.filtered`.ESC[0m
ESC[0;33m17:21:05 [synteny]ESC[0mESC[0;35m Chaining distance = 20ESC[0m
ESC[0;33m17:21:05 [base]ESC[0mESC[0;35m Load file `sa.bd.anchors`ESC[0m
ESC[0;33m17:21:05 [synteny]ESC[0mESC[0;35m A total of 0 anchor was found. Aborted.ESC[0m
Traceback (most recent call last):
  File "/home/copettid/miniconda3/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/copettid/miniconda3/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/copettid/miniconda3/lib/python2.7/site-packages/jcvi/compara/catalog.py", line 848, in <module>
    main()
  File "/home/copettid/miniconda3/lib/python2.7/site-packages/jcvi/compara/catalog.py", line 74, in main
    p.dispatch(globals())
  File "/home/copettid/miniconda3/lib/python2.7/site-packages/jcvi/apps/base.py", line 100, in dispatch
    globals[action](sys.argv[2:])
  File "/home/copettid/miniconda3/lib/python2.7/site-packages/jcvi/compara/catalog.py", line 662, in ortholog
    "--liftover={0}".format(last), "--no_strip_names"])
  File "/home/copettid/miniconda3/lib/python2.7/site-packages/jcvi/compara/synteny.py", line 1483, in scan
    summary([anchor_file])
  File "/home/copettid/miniconda3/lib/python2.7/site-packages/jcvi/compara/synteny.py", line 1050, in summary
    raise ValueError("A total of 0 anchor was found. Aborted.")
ValueError: A total of 0 anchor was found. Aborted.

I could do the run by renaming the cds and bed files with some letters replacing the dot:

>Bradi0180s00100.1
Bd1     10580   11638   Bradi1g00200.1  0       +

to

>Bradi0180s00100_mRNA1
Bd1     10580   11638   Bradi1g00200_mRNA1      0       +

Not sure why, but it worked for me

sfjdx1144 commented 2 years ago

I installed "last" to solve the problem. If you use Ubuntu, you can install by command "apt install last" .