Closed pascalangst closed 1 year ago
Hi there. I'm sure this issue comes from the contiguity of the genomes ... for example, your 'Cinc' genome has 49,131 genes. Of these, 26,193 are on scaffolds with < 10 unique orthogroups. In total, GENESPACE can only "see" about 36% of the genes in that genome.
Nonetheless, GENESPACE should report a more informative error than just breaking during syntenic block calculation. Would you mind sharing your /bed and /peptide directories? If so, shoot me an email jlovell[at]hudsonalpha[dot]org.
Thanks, John
Hi John,
I've sent you the folders. It is true that the genomes are not very contiguous (HiFi, but no further scaffolding). Is it possible to get the dotplots/riparian plot from our data, as it was possible with the previous version of GENESPACE?
Thanks, Pascal
Thanks! I have the files and will get this ironed out. I should have a fix pushed to master today.
I will make the following changes to the source code:
1) If there are more than 100 "chromosomes" with synteny, only plot the 100 largest (currently if there are more than 10,000 chr-chr combinations it doesn't make a dotplot). Riparian plot will still push through but it will look bad.
2) If there are more than 100 chromosomes with synteny, do not try to phase by reference genome chromosomes, and do not color the riparian plot (this is what is causing the error that you observe).
3) report warnings and context for 1-2, telling the user to use query_hits
and/or ggdotplot
to make the dotplots and set plot_riparian(..., minChrLen2plot = 0)
to see all scaffolds. But note that you'll need to make very large graphics windows to see these since you have so many gaps between small scaffolds.
Note that GENESPACE really is designed for chromosome-scale genomes. Synteny becomes less useful of a tool when you don't have long runs of genes on single chromosomes. I will add a note regarding this in the readme and in the warnings returned during genome QC.
OK - I have a fix pushed to master now as v1.1.7.
Now, if you give genespace very broken up genomes, it will still make dotplots (but simple base plots so that facetting doesn't obscure the actual points) and pan-gene sets (but will return a warning for the latter). It will not make riparian plots if the reference genome has > 100 chromosomes in the synteny network, and returns a warning saying why.
In this particular case, I am not sure that these genomes are good candidates for GENESPACE ... I can't seen any real synteny in the dotplots, likely because there aren't any chromosomes with enough genes on them.
This all sounds reasonable to me. Especially the messages, which tell the user what is going on under the hood.
For our case, we have used a combined approach for obtaining candidate contigs (large contig-to-contig sequence homology, relatively high number of shared number of single copy orthologues, ...) and used genespace mainly for visualization. We have used the options onlyTheseRegions = regions_of_interest, excludeNoRegChr=T
. I suspect there are options like this in versions v1.x (?). Your plots are very appealing.
I will give 1.1.7 a shot later today and share my experience. Thank you!
v1.1.7 works for me. Thank you!
The options highlightBed = regions_of_interest_bed, backgroundColor = NULL
result in the same sort of plot I described above. In the description, it is specified that useRegions = FALSE
will result in usage of "collinear syntenic blocks". Does that mean the genes are in the same order and orientation in all species? Generally, what is the difference between "aggregated syntenic regions" and "collinear syntenic blocks"?
Ok. good to hear! You can always include a related genome that is chromosome level, then do everything anchored to that. It would let you make plots and a reasonable pan-gene set.
Hi @jtlovell,
Switching from v0.9.3, I encountered some issues and hope to solve them with your help. Your latest release as well as today's master branch
devtools::install_github("jtlovell/GENESPACE")
did not entirely work for me.Here my sessionInfo
First, I tried to convert files from a previous run with the below code.
However, this gave me only the bed and peptide folder. The results folder was empty. I therefore decided to redo the orthology detection.
I'm using files from the species "Cinc" and "Mani":
The resulting dotplots and riparian plot are all 3.6 KB and cannot be displayed because they "don't have pages". Let me know what is needed to troubleshoot this error. I'm happy to share any file you need.
Cheers, Pascal