jtlovell / GENESPACE

Other
180 stars 24 forks source link

Error in `[.data.table`(bedm, , `:=`(reford, (1:.N)/.N), by = "genome") #117

Closed Sidduppal closed 1 year ago

Sidduppal commented 1 year ago

Hey thanks for this tool. I'm getting the following error during the run_genespace step.

############################
1. Running orthofinder (or parsing existing results)
        Checking for existing orthofinder results ...
[1] FALSE
        ... found existing run, not re-running orthofinder
############################
2. Combining and annotating bed files w/ OGs and tandem array info ...
        ##############
        Flagging chrs. w/ < 10 unique orthogroups
        ...LvStB     : 1188 genes on 204 small chrs. ***
        ...LvStB_2023:    0 genes on   0 small chrs.
        NOTE! Genomes flagged *** have > 5% of genes on small chrs.
                These are likely not great assemblies and should be
                examined carefully
        ##############
        Flagging over-dispered OGs
        ...LvStB     : 115 genes in  8 OGs hit > 8 unique places
        ...LvStB_2023: 584 genes in 13 OGs hit > 8 unique places ***
        NOTE! Genomes flagged *** have > 5% of genes in over-dispersed
                orthogroups. These are likely not great annotations, or
                the synteny run contains un-specified WGDs. Regardless,
                these should be examined carefully
        ##############
        Annotation summaries (after exclusions):
        ...LvStB     : 1352 genes in 1289 OGs ||  10 genes in  5 arrays
        ...LvStB_2023: 3135 genes in 2868 OGs || 118 genes in 54 arrays
############################
3. Combining and annotating the blast files with orthogroup info ...
        # Chunk 1 / 1 (17:42:46) ...
        ...LvStB_2023 v. LvStB_2023: total hits = 63808, same og = 16126
        ...LvStB_2023 v. LvStB:      total hits = 81943, same og = 2217
        ...LvStB v. LvStB:           total hits = 34622, same og = 4540
        ##############
        Generating dotplots for all hits ... Done!

############################
4. Flagging synteny for each pair of genomes ...
        # Chunk 1 / 1 (17:42:48) ...
        ...LvStB_2023 v. LvStB:      1297 hits (1109 anchors) in 87 blocks (0 SVs, 87 regions)
        ...LvStB_2023 v. LvStB_2023: 9771 hits (3599 anchors) in 3 blocks (0 SVs, 0 regions)
        ...LvStB v. LvStB:           2755 hits (2537 anchors) in 294 blocks (0 SVs, 0 regions)

############################
5. Building synteny-constrained orthogroups ...
        Done!

############################
6. Integrating syntenic positions across genomes ...
        ##############
        Generating syntenic dotplots ... Done!
        ##############
        Interpolating syntenic positions of genes ...
        LvStB_2023:  (0 / 1 / 2 / >2 syntenic positions)
                LvStB     :  108 / 1215 /    0 /    0
                LvStB_2023:    1 / 3630 /    0 /    0
        LvStB:  (0 / 1 / 2 / >2 syntenic positions)
                LvStB     :    0 / 2537 /    0 /    0
                LvStB_2023: 1797 / 1231 /    0 /    0
        Done!

############################
7. Final block coordinate calculation and riparian plotting ...
        ##############
        Calculating syntenic blocks by reference chromosomes ...
                n regions (aggregated by 25 gene radius): 414
                n blocks (collinear sets of > 5 genes): 414
        ##############
        Building ref.-phased blks and riparian plots for haploid genomes:
Error in `[.data.table`(bedm, , `:=`(reford, (1:.N)/.N), by = "genome") :
  Supplied 2 items to be assigned to group 1 of size 0 in column 'reford'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
In addition: Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  scheduled cores 3, 1 encountered errors in user code, all values of the jobs will be affected

I'm running it using bacterial metagenome assembled genomes (MAGs). I have one MAG assembled using long read data with just 3 contigs (reference) and the other has about 100 contigs.

Command used for init_genespace:

gpar<-init_genespace(wd="/media/bigdrive1/sidd/Beetle_proj_2023/synteny/genespace/Wd",path2mcscanx="/media/bigdrive1/sidd/third_party_tools/MCScanX")

Any help will be appreciated. Thanks

jtlovell commented 1 year ago

GENESPACE is specifically designed for eukaryote genomes, which have lots more genes and tend to retain synteny over large evolutionary scales. Given the very few genes in your run, my guess is that you just don't have enough anchors. Further, look at the warnings at the top of your run. These annotations look problematic and one of your genomes has roughly half of all genes on small chrs which will be ignored. GENESPACE probably isn't the right software for these input data.