jtlovell / GENESPACE

Other
189 stars 27 forks source link

Error running syntenty #36

Closed noor-albader closed 2 years ago

noor-albader commented 2 years ago

Hello, I was able to run the pipeline smoothly with the example data and a subset (2 genomes) of my data without error.

However, when running with 20+ genomes I obtained an error I have not be able to figure out.

Loading and parsing the data, as well as running orthofinder ran without a glitch but now (without changing any synteny parameters/running default parameters) I get the following error:

This part ran fine:

library(GENESPACE)
runwd <- file.path("~/Polyploid_group_renamed_mainChr")
list.files(runwd, recursive = T, full.names = F)

gpar <- init_genespace(
  genomeIDs = c("Lp","OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "Os", "OShh", "OSkk", "OP","OB"),
  speciesIDs = c("Lp","OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "Os", "OShh", "OSkk", "OP","OB"),
  versionIDs = c("Lp","OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "Os", "OShh", "OSkk", "OP","OB"),
  outgroup = "Lp",
  ploidy = rep(1,22),
  diamondMode = "fast",
  orthofinderMethod = "default",
  wd = runwd,
  orthofinderInBlk = TRUE, 
  overwrite = F, 
  verbose = T,
  nCores = 16,
  minPepLen = 50,
  gffString = "gff",
  pepString = "fa",
  path2orthofinder = "~/software/miniconda3/envs/orthofinder/bin/orthofinder",
  path2diamond = "diamond",
  path2mcscanx = "~/software/MCScanX",
  rawGenomeDir = file.path(runwd, "rawGenomes"))

parse_annotations(
  gsParam = gpar,
  genomeIDs = c("Lp","Os","OP","OB"),
  gffEntryType = "mRNA",
  gffIdColumn = "ID",
  gffStripText = "ID=",
  headerEntryIndex = 5,
  headerSep = " ",
  headerStripText = "ID=")

parse_annotations(
  gsParam = gpar,
  genomeIDs = c("OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "OShh", "OSkk"),
  gffEntryType = "Gene",
  gffIdColumn = "ID",
  gffStripText = "ID=",
  headerEntryIndex = 1,
  headerSep = " ",
  headerStripText = "ID=")

gpar<-run_orthofinder(gsParam=gpar)
gpar <- set_syntenyParams(gsParam = gpar)

BUT running the synteny function I get this error:

gpar <- synteny(gsParam = gpar)
Indexing location of orthofinder results ... Done!
Parsing the gff files ... 
    Reading the gffs and adding orthofinder IDs ... Done!
    Found 202398 global OGs for 752203 genes
    QC-ing genome to ensure chromosomes/scaffolds are big enough...
            Genome: n. chrs PASS/FAIL, n. genes PASS/FAIL, n. OGs PASS/FAIL
        OAcc: 12/0, 34936/0, 30124/0
        OAdd: 12/0, 31451/0, 27276/0
        OB: 12/0, 31218/0, 31218/0
        OCkk: 12/0, 25845/0, 23465/0
        OCll: 12/0, 26978/0, 24270/0
        OGcc: 12/0, 34778/0, 29932/0
        OGdd: 12/0, 31578/0, 27250/0
        OLcc: 12/0, 37543/0, 30440/0
        OLdd: 12/0, 36191/0, 29833/0
        OLhh: 12/0, 38679/0, 32752/0
        OLjj: 12/0, 34645/0, 29615/0
        OMALbb: 12/0, 37927/0, 32157/0
        OMALcc: 12/0, 39942/0, 33341/0
        OMINbb: 12/0, 36062/0, 30710/0
        OMINcc: 12/0, 38203/0, 31793/0
        OP: 12/0, 40917/0, 40917/0
        ORhh: 12/0, 43890/0, 36780/0
        ORjj: 12/0, 38916/0, 32907/0
        OShh: 12/0, 34293/0, 29068/0
        OSkk: 12/0, 36563/0, 31046/0
        Os: 12/0, 41648/0, 41648/0
    All look good!
    Defining collinear orthogroup arrays ... 
    Found the following counts of arrays / genome:
        OAcc: 6282 genes in 2422 collinear arrays
        OAdd: 5463 genes in 2121 collinear arrays
        OCkk: 3497 genes in 1535 collinear arrays
        OCll: 4060 genes in 1745 collinear arrays
        OGcc: 6130 genes in 2407 collinear arrays
        OGdd: 5450 genes in 2113 collinear arrays
        OLcc: 7860 genes in 2946 collinear arrays
        OLdd: 6959 genes in 2624 collinear arrays
        OLhh: 6944 genes in 2714 collinear arrays
        OLjj: 6466 genes in 2532 collinear arrays
        OMALbb: 7109 genes in 2738 collinear arrays
        OMALcc: 7939 genes in 3046 collinear arrays
        OMINbb: 6585 genes in 2561 collinear arrays
        OMINcc: 7466 genes in 2864 collinear arrays
        ORhh: 7663 genes in 3014 collinear arrays
        ORjj: 7356 genes in 2810 collinear arrays
        OShh: 6944 genes in 2585 collinear arrays
        OSkk: 6553 genes in 2615 collinear arrays
Pulling synteny for 231 unique pairwise combinations of genomes
    Running 15 chunks of up to 16 combinations each:
    Chunk 1 / 15 (10:02:14 AM) ... Done!
Error: $ operator is invalid for atomic vectors
In addition: Warning message:
In mclapply(1:nrow(splSynp[[i]]), mc.cores = nCores, function(j) { :
  scheduled core 1, 14, 12, 9, 2, 11, 6 encountered error in user code, all values of the job will be affected

I am not sure, but is this error is due to the wrapper's scheduler perhaps? If not, do you have an idea where this error (that occurs only with larger sets of data) is originating from?

jtlovell commented 2 years ago

Thanks for posting - this type of issue is why I am working on V1 (which will do format checks up front for everything). Its super hard to troubleshoot. First, can you confirm that the orthofinder run was successful ... it should have spit out a really long dialog. You can go into /orthofinder/resultsXX/orthogroups and make sure there is an Orthogroups.tsv file. If thats there and all good, then please re-run synteny with nCores = 1. That will give a more informative error.

noor-albader commented 2 years ago

Hi John Thank you for your reply! Yes, Orthofinder does run successfully every time I have tried runing the pipeline and an output directroy /orthofinder/resultsXX/orthogroups is created and there is an Orthogroups.tsv file.

What I realized is that orthofinder outputs orthogroups for all 22 genomes (including the outgroup (Lp)) but I realized when running synteny/McScan only 18 of the genomes' genes are placed in collinear arrays. The 4 that were excluded were Lp (the outgroup) and OB, OS and OP.

Two things differ with these 4 genomes: (1) Orthofinder did annotated 100% of their genes as orthologs (see step (4) below) (2) These are the 4 genomes that were run with separately with parsing with (see step (2) below)

To demonstrate these points above I re-ran the pipeline for the outputted log and took your suggestion of running with nCores=1 and get the following: (1) initialising Genespace

> gpar <- init_genespace(
  genomeIDs = c("Lp","OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "Os", "OShh", "OSkk", "OP","OB"),
  speciesIDs = c("Lp","OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "Os", "OShh", "OSkk", "OP","OB"),
  versionIDs = c("Lp","OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "Os", "OShh", "OSkk", "OP","OB"),
  outgroup = "Lp",
  ploidy = rep(1,22),
  diamondMode = "fast",
  orthofinderMethod = "default",
  wd = runwd,
  orthofinderInBlk = TRUE, 
  overwrite = F, 
  verbose = T,
  nCores = 1,
  minPepLen = 50,
  gffString = "gff",
  pepString = "fa",
  path2orthofinder = "/home/albadenm/software/miniconda3/envs/orthofinder/bin/orthofinder",
  path2diamond = "diamond",
  path2mcscanx = "/home/albadenm/software/MCScanX",
  rawGenomeDir = file.path(runwd, "rawGenomes"))
Initializing GENESPACE run
    checking genomeIDs ... PASS (Lp, OAcc, OAdd, OCkk, OCll, OGcc, OGdd, OLcc, OLdd, OLhh, OLjj, OMALbb, OMALcc, OMINbb, OMINcc, ORhh, ORjj, Os, OShh, OSkk, OP, OB)
    checking outgroup ... PASS (Lp)
    checking ploidy ... PASS (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    checking the number of parallel processes ... PASS (1)
    Verbosity ... PASS (TRUE)
    minPepLen ... PASS (50)
    checking working directory ... PASS (/home/albadenm/Polyploid_group_renamed_mainChr)
    checking parsed gff files ... PASS (/home/albadenm/Polyploid_group_renamed_mainChr/gff)
    checking parsed peptide files ... PASS (/home/albadenm/Polyploid_group_renamed_mainChr/peptide)
Checking dependencies and 3rd party installations
    MCScanX installation ... PASS (/home/albadenm/software/MCScanX/MCScanX_h)
    Orthofinder installation ... PASS (/home/albadenm/software/miniconda3/envs/orthofinder/bin/orthofinder)
    OrthoFinder method ... PASS - (default inside R)
    Orthofinder in block method ... PASS (TRUE)
GENESPACE run successfully initialized

(2) Parsing the annotation

parse_annotations(
  gsParam = gpar,
  genomeIDs = c("Lp","Os","OP","OB"),
  gffEntryType = "mRNA",
  gffIdColumn = "ID",
  gffStripText = "ID=",
  headerEntryIndex = 5,
  headerSep = " ",
  headerStripText = "ID=")
Parsing annotation files ...
    Lp ... 
        Importing gff ... found 38961 gff entires, and 38961 mRNA entries
        Importing fasta ... found 38960 fasta entires
        38836 gff-peptide matches
    Done!
    Os ... 
        Importing gff ... found 44643 gff entires, and 44643 mRNA entries
        Importing fasta ... found 42355 fasta entires
        41648 gff-peptide matches
    Done!
    OP ... 
        Importing gff ... found 41060 gff entires, and 41060 mRNA entries
        Importing fasta ... found 41060 fasta entires
        40917 gff-peptide matches
    Done!
    OB ... 
        Importing gff ... found 31356 gff entires, and 31356 mRNA entries
        Importing fasta ... found 32037 fasta entires
        31218 gff-peptide matches
    Done!
> parse_annotations(
  gsParam = gpar,
  genomeIDs = c("OAcc","OAdd","OCkk", "OCll", "OGcc", "OGdd", "OLcc", "OLdd","OLhh", "OLjj", "OMALbb", "OMALcc", "OMINbb", "OMINcc", "ORhh", "ORjj", "OShh", "OSkk"),
  gffEntryType = "Gene",
  gffIdColumn = "ID",
  gffStripText = "ID=",
  headerEntryIndex = 1,
  headerSep = " ",
  headerStripText = "ID=")
Parsing annotation files ...
    OAcc ... 
        Importing gff ... found 34936 gff entires, and 34936 Gene entries
        Importing fasta ... found 34936 fasta entires
        34936 gff-peptide matches
    Done!
    OAdd ... 
        Importing gff ... found 31451 gff entires, and 31451 Gene entries
        Importing fasta ... found 31451 fasta entires
        31451 gff-peptide matches
    Done!
    OCkk ... 
        Importing gff ... found 25845 gff entires, and 25845 Gene entries
        Importing fasta ... found 25845 fasta entires
        25845 gff-peptide matches
    Done!
    OCll ... 
        Importing gff ... found 26978 gff entires, and 26978 Gene entries
        Importing fasta ... found 26978 fasta entires
        26978 gff-peptide matches
    Done!
    OGcc ... 
        Importing gff ... found 34778 gff entires, and 34778 Gene entries
        Importing fasta ... found 34778 fasta entires
        34778 gff-peptide matches
    Done!
    OGdd ... 
        Importing gff ... found 31578 gff entires, and 31578 Gene entries
        Importing fasta ... found 31578 fasta entires
        31578 gff-peptide matches
    Done!
    OLcc ... 
        Importing gff ... found 37543 gff entires, and 37543 Gene entries
        Importing fasta ... found 37543 fasta entires
        37543 gff-peptide matches
    Done!
    OLdd ... 
        Importing gff ... found 36191 gff entires, and 36191 Gene entries
        Importing fasta ... found 36191 fasta entires
        36191 gff-peptide matches
    Done!
    OLhh ... 
        Importing gff ... found 38679 gff entires, and 38679 Gene entries
        Importing fasta ... found 38679 fasta entires
        38679 gff-peptide matches
    Done!
    OLjj ... 
        Importing gff ... found 34645 gff entires, and 34645 Gene entries
        Importing fasta ... found 34645 fasta entires
        34645 gff-peptide matches
    Done!
    OMALbb ... 
        Importing gff ... found 37927 gff entires, and 37927 Gene entries
        Importing fasta ... found 37927 fasta entires
        37927 gff-peptide matches
    Done!
    OMALcc ... 
        Importing gff ... found 39942 gff entires, and 39942 Gene entries
        Importing fasta ... found 39942 fasta entires
        39942 gff-peptide matches
    Done!
    OMINbb ... 
        Importing gff ... found 36063 gff entires, and 36063 Gene entries
        Importing fasta ... found 36063 fasta entires
        36062 gff-peptide matches
    Done!
    OMINcc ... 
        Importing gff ... found 38203 gff entires, and 38203 Gene entries
        Importing fasta ... found 38203 fasta entires
        38203 gff-peptide matches
    Done!
    ORhh ... 
        Importing gff ... found 43892 gff entires, and 43892 Gene entries
        Importing fasta ... found 43892 fasta entires
        43890 gff-peptide matches
    Done!
    ORjj ... 
        Importing gff ... found 38916 gff entires, and 38916 Gene entries
        Importing fasta ... found 38916 fasta entires
        38916 gff-peptide matches
    Done!
    OShh ... 
        Importing gff ... found 34293 gff entires, and 34293 Gene entries
        Importing fasta ... found 34293 fasta entires
        34293 gff-peptide matches
    Done!
    OSkk ... 
        Importing gff ... found 36564 gff entires, and 36564 Gene entries
        Importing fasta ... found 36564 fasta entires
        36563 gff-peptide matches
    Done!

(3) Running Orthofinder

> gpar<-run_orthofinder(gsParam=gpar)
Synteny Parameters have not been set! Setting to defaults
    Running 'defualt' genespace orthofinder method 
    ############################################################
    Cleaning out orthofinder directory and prepping run
    Calculating blast results and running OrthoFinder 
    ################################################## 
    ##################################################

    OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

    2022-09-16 16:16:28 : Starting OrthoFinder 2.5.4
    16 thread(s) for highly parallel tasks (BLAST searches etc.)
    1 thread(s) for OrthoFinder algorithm

    Checking required programs are installed
    ----------------------------------------
    Test can run "mcl -h" - ok
    Test can run "fastme -i /home/albadenm/Polyploid_group_renamed_mainChr/orthofinder/Results_Sep16/WorkingDirectory/SimpleTest.phy -o /home/albadenm/Polyploid_group_renamed_mainChr/orthofinder/Results_Sep16/WorkingDirectory/SimpleTest.tre" - ok

    Dividing up work for BLAST for parallel processing
    --------------------------------------------------
    2022-09-16 16:16:31 : Creating diamond database 1 of 22
    2022-09-16 16:16:31 : Creating diamond database 2 of 22
    2022-09-16 16:16:32 : Creating diamond database 3 of 22
    2022-09-16 16:16:32 : Creating diamond database 4 of 22
    2022-09-16 16:16:32 : Creating diamond database 5 of 22
    2022-09-16 16:16:32 : Creating diamond database 6 of 22
    2022-09-16 16:16:32 : Creating diamond database 7 of 22
    2022-09-16 16:16:32 : Creating diamond database 8 of 22
    2022-09-16 16:16:33 : Creating diamond database 9 of 22
    2022-09-16 16:16:33 : Creating diamond database 10 of 22
    2022-09-16 16:16:33 : Creating diamond database 11 of 22
    2022-09-16 16:16:33 : Creating diamond database 12 of 22
    2022-09-16 16:16:33 : Creating diamond database 13 of 22
    2022-09-16 16:16:33 : Creating diamond database 14 of 22
    2022-09-16 16:16:34 : Creating diamond database 15 of 22
    2022-09-16 16:16:34 : Creating diamond database 16 of 22
    2022-09-16 16:16:34 : Creating diamond database 17 of 22
    2022-09-16 16:16:34 : Creating diamond database 18 of 22
    2022-09-16 16:16:34 : Creating diamond database 19 of 22
    2022-09-16 16:16:35 : Creating diamond database 20 of 22
    2022-09-16 16:16:35 : Creating diamond database 21 of 22
    2022-09-16 16:16:35 : Creating diamond database 22 of 22

    Running diamond all-versus-all
    ------------------------------
    Using 16 thread(s)
    2022-09-16 16:16:35 : This may take some time....
    2022-09-16 16:16:35 : Done 0 of 484
    2022-09-16 16:41:16 : Done 100 of 484
    2022-09-16 17:03:12 : Done 200 of 484
    2022-09-16 17:24:52 : Done 300 of 484
    2022-09-16 17:42:37 : Done 400 of 484
    2022-09-16 17:54:57 : Done all-versus-all sequence search

    Running OrthoFinder algorithm
    -----------------------------
    2022-09-16 17:54:59 : Initial processing of each species
    2022-09-16 17:56:19 : Initial processing of species 0 complete
    2022-09-16 17:57:30 : Initial processing of species 1 complete
    2022-09-16 17:58:32 : Initial processing of species 2 complete
    2022-09-16 17:59:30 : Initial processing of species 3 complete
    2022-09-16 18:00:17 : Initial processing of species 4 complete
    2022-09-16 18:01:07 : Initial processing of species 5 complete
    2022-09-16 18:02:18 : Initial processing of species 6 complete
    2022-09-16 18:03:22 : Initial processing of species 7 complete
    2022-09-16 18:04:38 : Initial processing of species 8 complete
    2022-09-16 18:05:51 : Initial processing of species 9 complete
    2022-09-16 18:07:02 : Initial processing of species 10 complete
    2022-09-16 18:08:10 : Initial processing of species 11 complete
    2022-09-16 18:09:24 : Initial processing of species 12 complete
    2022-09-16 18:10:40 : Initial processing of species 13 complete
    2022-09-16 18:11:51 : Initial processing of species 14 complete
    2022-09-16 18:13:05 : Initial processing of species 15 complete
    2022-09-16 18:14:27 : Initial processing of species 16 complete
    2022-09-16 18:15:44 : Initial processing of species 17 complete
    2022-09-16 18:16:56 : Initial processing of species 18 complete
    2022-09-16 18:18:01 : Initial processing of species 19 complete
    2022-09-16 18:19:07 : Initial processing of species 20 complete
    2022-09-16 18:20:26 : Initial processing of species 21 complete
    2022-09-16 18:22:25 : Connected putative homologues
    2022-09-16 18:22:38 : Written final scores for species 0 to graph file
    2022-09-16 18:22:50 : Written final scores for species 1 to graph file
    2022-09-16 18:23:01 : Written final scores for species 2 to graph file
    2022-09-16 18:23:11 : Written final scores for species 3 to graph file
    2022-09-16 18:23:20 : Written final scores for species 4 to graph file
    2022-09-16 18:23:29 : Written final scores for species 5 to graph file
    2022-09-16 18:23:40 : Written final scores for species 6 to graph file
    2022-09-16 18:23:51 : Written final scores for species 7 to graph file
    2022-09-16 18:24:04 : Written final scores for species 8 to graph file
    2022-09-16 18:24:16 : Written final scores for species 9 to graph file
    2022-09-16 18:24:29 : Written final scores for species 10 to graph file
    2022-09-16 18:24:41 : Written final scores for species 11 to graph file
    2022-09-16 18:24:53 : Written final scores for species 12 to graph file
    2022-09-16 18:25:07 : Written final scores for species 13 to graph file
    2022-09-16 18:25:19 : Written final scores for species 14 to graph file
    2022-09-16 18:25:32 : Written final scores for species 15 to graph file
    2022-09-16 18:25:45 : Written final scores for species 16 to graph file
    2022-09-16 18:26:00 : Written final scores for species 17 to graph file
    2022-09-16 18:26:13 : Written final scores for species 18 to graph file
    2022-09-16 18:26:24 : Written final scores for species 19 to graph file
    2022-09-16 18:26:36 : Written final scores for species 20 to graph file
    2022-09-16 18:26:50 : Written final scores for species 21 to graph file

    WARNING: program called by OrthoFinder produced output to stderr

    Command: mcl /home/albadenm/Polyploid_group_renamed_mainChr/orthofinder/Results_Sep16/WorkingDirectory/OrthoFinder_graph.txt -I 1.5 -o /home/albadenm/Polyploid_group_renamed_mainChr/orthofinder/Results_Sep16/WorkingDirectory/clusters_OrthoFinder_I1.5.txt -te 1 -V all

    stdout
    ------
    b''
    stderr
    ------
    b'[mcl] cut <1> instances of overlap\n'
    2022-09-16 18:36:24 : Ran MCL

    Writing orthogroups to file
    ---------------------------
    OrthoFinder assigned 738903 genes (93.4% of total) to 52873 orthogroups. Fifty percent of all genes were in orthogroups with 24 or more genes (G50 was 24) and were contained in the largest 9276 orthogroups (O50 was 9276). There were 8469 orthogroups with all species present and 1739 of these consisted entirely of single-copy genes.

    2022-09-16 18:36:40 : Done orthogroups

    Analysing Orthogroups
    =====================

    Calculating gene distances
    --------------------------
    2022-09-16 18:58:37 : Done
    2022-09-16 18:58:41 : Done 0 of 30576
    2022-09-16 18:58:52 : Done 1000 of 30576
    2022-09-16 18:58:53 : Done 2000 of 30576
    2022-09-16 18:58:53 : Done 3000 of 30576
    2022-09-16 18:58:54 : Done 4000 of 30576
    2022-09-16 18:58:55 : Done 5000 of 30576
    2022-09-16 18:58:55 : Done 6000 of 30576
    2022-09-16 18:58:56 : Done 7000 of 30576
    2022-09-16 18:58:56 : Done 8000 of 30576
    2022-09-16 18:58:57 : Done 9000 of 30576
    2022-09-16 18:58:57 : Done 10000 of 30576
    2022-09-16 18:58:58 : Done 11000 of 30576
    2022-09-16 18:58:58 : Done 12000 of 30576
    2022-09-16 18:58:59 : Done 13000 of 30576
    2022-09-16 18:58:59 : Done 14000 of 30576
    2022-09-16 18:59:00 : Done 15000 of 30576
    2022-09-16 18:59:00 : Done 16000 of 30576
    2022-09-16 18:59:01 : Done 17000 of 30576
    2022-09-16 18:59:01 : Done 18000 of 30576
    2022-09-16 18:59:02 : Done 19000 of 30576
    2022-09-16 18:59:02 : Done 20000 of 30576
    2022-09-16 18:59:02 : Done 21000 of 30576
    2022-09-16 18:59:03 : Done 22000 of 30576
    2022-09-16 18:59:03 : Done 23000 of 30576
    2022-09-16 18:59:04 : Done 24000 of 30576
    2022-09-16 18:59:04 : Done 25000 of 30576
    2022-09-16 18:59:05 : Done 26000 of 30576
    2022-09-16 18:59:05 : Done 27000 of 30576
    2022-09-16 18:59:06 : Done 28000 of 30576
    2022-09-16 18:59:06 : Done 29000 of 30576
    2022-09-16 18:59:07 : Done 30000 of 30576

    Inferring gene and species trees
    --------------------------------

    8469 trees had all species present and will be used by STAG to infer the species tree

    Best outgroup(s) for species tree
    ---------------------------------
    2022-09-16 19:01:16 : Starting STRIDE
    2022-09-16 19:01:20 : Done STRIDE
    Observed 555 well-supported, non-terminal duplications. 551 support the best root and 4 contradict it.
    Best outgroup for species tree:
      OB

    Reconciling gene trees and species tree
    ---------------------------------------
    Outgroup: OB
    2022-09-16 19:01:20 : Starting Recon and orthologues
    2022-09-16 19:01:20 : Starting OF Orthologues
    2022-09-16 19:01:22 : Done 0 of 30576
    2022-09-16 19:01:55 : Done 1000 of 30576
    2022-09-16 19:02:12 : Done 2000 of 30576
    2022-09-16 19:02:26 : Done 3000 of 30576
    2022-09-16 19:02:38 : Done 4000 of 30576
    2022-09-16 19:02:49 : Done 5000 of 30576
    2022-09-16 19:02:57 : Done 6000 of 30576
    2022-09-16 19:03:05 : Done 7000 of 30576
    2022-09-16 19:03:13 : Done 8000 of 30576
    2022-09-16 19:03:21 : Done 9000 of 30576
    2022-09-16 19:03:29 : Done 10000 of 30576
    2022-09-16 19:03:36 : Done 11000 of 30576
    2022-09-16 19:03:43 : Done 12000 of 30576
    2022-09-16 19:03:50 : Done 13000 of 30576
    2022-09-16 19:03:58 : Done 14000 of 30576
    2022-09-16 19:04:05 : Done 15000 of 30576
    2022-09-16 19:04:11 : Done 16000 of 30576
    2022-09-16 19:04:18 : Done 17000 of 30576
    2022-09-16 19:04:24 : Done 18000 of 30576
    2022-09-16 19:04:30 : Done 19000 of 30576
    2022-09-16 19:04:35 : Done 20000 of 30576
    2022-09-16 19:04:38 : Done 21000 of 30576
    2022-09-16 19:04:41 : Done 22000 of 30576
    2022-09-16 19:04:44 : Done 23000 of 30576
    2022-09-16 19:04:45 : Done 24000 of 30576
    2022-09-16 19:04:47 : Done 25000 of 30576
    2022-09-16 19:04:49 : Done 26000 of 30576
    2022-09-16 19:04:50 : Done 27000 of 30576
    2022-09-16 19:04:52 : Done 28000 of 30576
    2022-09-16 19:04:53 : Done 29000 of 30576
    2022-09-16 19:04:54 : Done 30000 of 30576
    2022-09-16 19:04:55 : Done OF Orthologues

    Writing results files
    =====================
    2022-09-16 19:04:58 : Done orthologues

    Results:
        /home/albadenm/Polyploid_group_renamed_mainChr/orthofinder/Results_Sep16/

    CITATION:
     When publishing work that uses OrthoFinder please cite:
     Emms D.M. & Kelly S. (2019), Genome Biology 20:238

     If you use the species tree in your work then please also cite:
     Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
     Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914> 

(4) Running McScan

gpar <- synteny(gsParam = gpar)
Synteny Parameters have not been set! Setting to defaults
Indexing location of orthofinder results ... Done!
Parsing the gff files ... 
    Reading the gffs and adding orthofinder IDs ... Done!
    Found 203172 global OGs for 752971 genes
    QC-ing genome to ensure chromosomes/scaffolds are big enough...
            Genome: n. chrs PASS/FAIL, n. genes PASS/FAIL, n. OGs PASS/FAIL
        OAcc: 12/0, 34936/0, 30125/0
        OAdd: 12/0, 31451/0, 27277/0
        OB: 53/156, 31597/295, 31597/295
        OCkk: 12/0, 25845/0, 23464/0
        OCll: 12/0, 26978/0, 24273/0
        OGcc: 12/0, 34778/0, 29934/0
        OGdd: 12/0, 31578/0, 27247/0
        OLcc: 12/0, 37543/0, 30439/0
        OLdd: 12/0, 36191/0, 29831/0
        OLhh: 12/0, 38679/0, 32749/0
        OLjj: 12/0, 34645/0, 29611/0
        OMALbb: 12/0, 37927/0, 32155/0
        OMALcc: 12/0, 39942/0, 33339/0
        OMINbb: 12/0, 36062/0, 30710/0
        OMINcc: 12/0, 38203/0, 31793/0
        OP: 12/0, 40917/0, 40917/0
        ORhh: 12/0, 43890/0, 36776/0
        ORjj: 12/0, 38916/0, 32907/0
        OShh: 12/0, 34293/0, 29067/0
        OSkk: 12/0, 36563/0, 31044/0
        Os: 14/0, 41742/0, 41742/0
    All look good!
    Defining collinear orthogroup arrays ... 
    Found the following counts of arrays / genome:
        OAcc: 6276 genes in 2419 collinear arrays
        OAdd: 5455 genes in 2118 collinear arrays
        OCkk: 3497 genes in 1535 collinear arrays
        OCll: 4055 genes in 1743 collinear arrays
        OGcc: 6127 genes in 2407 collinear arrays
        OGdd: 5437 genes in 2106 collinear arrays
        OLcc: 7862 genes in 2948 collinear arrays
        OLdd: 6961 genes in 2625 collinear arrays
        OLhh: 6941 genes in 2712 collinear arrays
        OLjj: 6469 genes in 2534 collinear arrays
        OMALbb: 7113 genes in 2741 collinear arrays
        OMALcc: 7942 genes in 3049 collinear arrays
        OMINbb: 6588 genes in 2561 collinear arrays
        OMINcc: 7463 genes in 2864 collinear arrays
        ORhh: 7664 genes in 3014 collinear arrays
        ORjj: 7358 genes in 2812 collinear arrays
        OShh: 6944 genes in 2585 collinear arrays
        OSkk: 6552 genes in 2614 collinear arrays
Pulling synteny for 206 unique pairwise combinations of genomes
    Running 206 chunks of up to 1 combinations each:
    Chunk 1 / 206 (07:12:58 PM) ... Error in `[.data.table`(a, , `:=`(scrRank1, 1:.N), by = "ofID1") : 
  Supplied 2 items to be assigned to group 1 of size 0 in column 'scrRank1'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
jtlovell commented 2 years ago

huh ... everything looks good here. That particular error you have means that genespace can't find the blast file. It for sure can find the orthogroups.tsv file (thats how it can make the collinear arrays). I have seen this error before, but only when there are multiple orthofinder runs in the /orthofinder directory, or the user added a subsequent genome to a previous orthofinder run. Did you do either of those? If not, shoot me an email and we will get this figured out. In the future, GENESPACE v1 will catch these issues up front and make troubleshooting easier. But that is about a month out and I'd like to help you get this figured out before then.

ap2425 commented 2 years ago

Hi, I'm wondering if this issue ever got resolved because I am encountering an error at the same spot in the synteny function and assume it's probably the same problem.

jtlovell commented 2 years ago

The issue above was due to a mismatch in genomeIDs and those used in the orthofinder run, I think caused by running orthofinder multiple times with different genomeIDs. I would recommend cleaning out your orthofinder directory and re-running from scratch. If that doesn't fix it, let me know.

ap2425 commented 2 years ago

I deleted all the folders and reran it from scratch and still encountered the same issue. It makes it to chunk 3/48 before erroring our with the below message. The idea of IDs not matching seems plausible, but when I check my gIDs with the species listed in some of the orthofinder output files, they all match up exactly. "Error: $ operator is invalid for atomic vectors In addition: Warning message: In mclapply(1:nrow(splSynp[[i]]), mc.cores = nCores, function(j) { : scheduled core 1 encountered error in user code, all values of the job will be affected"

After some trial and error, it seems that certain species are breaking it and it successfully runs on some smaller subsets of more model organisms. However for all species I have the RefSeq translated_cds fasta and genomic.gff

jtlovell commented 1 year ago

OK - I know the problem and I have a quick solution - the problematic genomes have special characters that orthofinder strips out internally. These are at least:'|' and ':'. So, genes that came in with those symbols in their names came out with a different name and couldn't be merged into the combined bed file.

v1.1.3 will be pushed to /dev ASAP and will contain a fix for this. Basically, the solution is to replace all special characters with '_'. This isn't the best solution, but it is the only one I can implement right now.