Open Wongolini opened 1 year ago
@Wongolini
Would you please check if scaffold97size618721
somehow shows up in the CDS and last output? Usually this complaint is about a gene (4th column) not found in the BED file, and not about the scaffold.
If this is what you are asking, then I believe the scaffold exists in the .cds file.
`(mcscan) [~/Documents/Cleves_lab/mcscan $]grep scaffold97size618721 aiptasia_genome.cds
>scaffold97size618721`
and in last:
(mcscan) [~/Documents/Cleves_lab/mcscan $]grep scaffold97size618721 aiptasia_genome.gfas.last | head
scaffold97size618721 xfSc0000357 60.10 9557 3764 13 157623 148087 65759 75286 0 3.01e+03
scaffold97size618721 xfSc0000918 59.44 2774 1116 3 124273 127040 26168 28938 1.9e-227 802
scaffold97size618721 Sc0000468 58.41 3056 1262 3 124258 127307 94724 97776 1.1e-222 786
scaffold97size618721 xfSc0000357 60.57 1291 503 1 159664 158380 63697 64987 6.4e-119 441
scaffold97size618721 xfSc0000143 57.54 1797 760 1 157220 155427 125951 127747 5.1e-117 435
scaffold97size618721 xfSc0000143 58.54 1611 656 3 155354 153747 127823 129424 2.6e-111 416
scaffold97size618721 xfSc0000143 59.47 1278 515 1 153635 152358 129542 130816 2.9e-102 386
scaffold97size618721 xfSc0000143 59.31 1214 485 2 159664 158457 123501 124711 3.6e-94 359
scaffold97size618721 xfSc0000357 59.22 1160 464 1 167009 165850 54630 55780 6.2e-89 342
scaffold97size618721 xfSc0000143 58.35 1215 506 0 149518 148304 133650 134864 4e-85 329
But it is not in last.filtered
@Wongolini
Yes that scaffold sequence in the .cds file is what generated the warning. The program did not expect to find the scaffold sequences there, because it expects only gene sequences (precisely, anything present in the 4th column in the .bed file). ".cds" means coding sequences.
If you remove the scaffold sequence from the ".cds", the warning will be gone, or just ignore the warning. Since the program can't make use of it either way.
I see thank you. So these warnings should not necessarily be a reason why the program fails to find any anchors? I fear that my genome assemblies are too poor to have any synteny analysis done on them with MCscan - could that be the case? However, I know that there are syntenic blocks in these genomes because I have found them manually. It appears to fail even before applying cscore so I am not sure what else to tweak. Thanks for all your help!
WARNING too many warnings.. suppressed blastfilter.py:67
[11:02:33] DEBUG running the cscore filter (cscore>=0.70) .. blastfilter.py:108
DEBUG after filter (0->0) .. blastfilter.py:110
DEBUG running the local dups filter (tandem_Nmax=10) .. blastfilter.py:115
DEBUG after filter (0->0) .. blastfilter.py:155
DEBUG Assuming --qbed=aiptasia_genome.bed --sbed=gfas.bed synteny.py:496
DEBUG Load file `aiptasia_genome.bed` base.py:34
DEBUG Load file `gfas.bed` base.py:34
DEBUG Load file `aiptasia_genome.gfas.last.filtered` base.py:34
DEBUG A total of 0 BLAST imported from `aiptasia_genome.gfas.last.filtered`. synteny.py:383
DEBUG Chaining distance = 20 synteny.py:1928
DEBUG Load file `aiptasia_genome.gfas.anchors` base.py:34
DEBUG A total of 0 anchor was found. Aborted.
@Wongolini
Please make sure that you are using genes, not scaffold sequences in aiptasia_genome.cds
.
It appears that you are directly comparing scaffold sequences, which is unfortunately not what jcvi is designed to do.
Consequently there are 0 anchors after filtering as shown in the logs.
Try to use gene sequences directly, e.g. AIPGENE19630
in the .cds
file instead of the scaffold sequences. Then you'll have gene matches in the .last
files, which jcvi synteny algorithm can operate on.
Thanks for your help earlier - I just did not properly read the error out. The example works now, so I moved on to my own genomes. My own genomes are terrible, with very high N50s and lots of putative isoforms in genes. Could this have something to do with the failure? I ran your suggested --primary_only when coverting the gffs to bed files. Have you ever seen this error before?
but