jtlovell / GENESPACE

Other
184 stars 24 forks source link

run_genespace hogs error #75

Closed SwiftSeal closed 1 year ago

SwiftSeal commented 1 year ago

Heya,

I'm running v1.1.4 on some manually curated genomes. Everything was working fine until I added an additional genome and run_genespace now experiences a quiet error:

I've attached the output of the run below. The orthofinder directory looks complete and there is data in the results folder, but no output plots etc. run_genespace encounters an error before it can be assigned to out. Is it failing during set_syntenyParams.R? I used an identical set of gpar and it was working fine, just without the verrucosum genome included.

I had a look at the recent commits up to v.1.1.5 but can't see anything related to the parameter.

Happy to share any additional files if needed.

> gpar <- init_genespace(wd = getwd(), nCores = 12, path2mcscanx="~/scratch/apps/MCScanX/")
Checking Working Directory ... PASS: `/mnt/shared/scratch/msmith/solanum_pangenomics`
Checking user-defined parameters ...
        Genome IDs & ploidy ...
                chacoense       : 1
                lycopersicum    : 1
                melongena       : 1
                pennellii       : 1
                pimpinellifolium: 1
                tuberosum       : 1
                verrucosum      : 1
        Outgroup ... NONE
        n. parallel processes ... 12
        collinear block size ... 5
        collinear block search radius ... 25
        n gaps in collinear block ... 5
        synteny buffer size... 100
        only orthogroups hits as anchors ... TRUE
        n secondary hits ... 0
Checking annotation files (.bed and peptide .fa):
        chacoense       : 35818 / 37518 geneIDs exactly match (PASS)
        lycopersicum    : 34131 / 34688 geneIDs exactly match (PASS)
        melongena       : 33644 / 34916 geneIDs exactly match (PASS)
        pennellii       : 47557 / 48923 geneIDs exactly match (PASS)
        pimpinellifolium: 35532 / 35761 geneIDs exactly match (PASS)
        tuberosum       : 32819 / 32917 geneIDs exactly match (PASS)
        verrucosum      : 29689 / 29689 geneIDs exactly match (PASS)
Checking dependencies ...
        Found valid path to OrthoFinder v2.54: `orthofinder`
        Found valid path to DIAMOND2 v2.015: `diamond`
        Found valid MCScanX_h executable: `/home/msmith/scratch/apps/MCScanX//MCScanX_h`
> out <- run_genespace(gpar, overwrite = T)

############################
1. Running orthofinder (or parsing existing results)
        Checking for existing orthofinder results ...
        Copying files over to the temporary directory:
                /mnt/shared/scratch/msmith/solanum_pangenomics/tmp
        Running the following command in the shell: `orthofinder -f
                /mnt/shared/scratch/msmith/solanum_pangenomics/tmp -t
                12 -a 1 -X -o
                /mnt/shared/scratch/msmith/solanum_pangenomics/orthofinder`.This
                can take a while. To check the progress, look in the
                `WorkingDirectory` in the output (-o) directory

        OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

        2023-03-14 09:58:21 : Starting OrthoFinder 2.5.4
        12 thread(s) for highly parallel tasks (BLAST searches etc.)
        1 thread(s) for OrthoFinder algorithm

        Checking required programs are installed
        ----------------------------------------
        Test can run "mcl -h" - ok
        Test can run "fastme -i /mnt/shared/scratch/msmith/solanum_pangenomics/orthofinder/Results_Mar14/WorkingDirectory/SimpleTest.phy -o /mnt/shared/scratch/msmith/solanum_pangenomics/orthofinder/Results_Mar14/WorkingDirectory/SimpleTest.tre" - ok

        Dividing up work for BLAST for parallel processing
        --------------------------------------------------
        2023-03-14 09:58:24 : Creating diamond database 1 of 7
        2023-03-14 09:58:24 : Creating diamond database 2 of 7
        2023-03-14 09:58:24 : Creating diamond database 3 of 7
        2023-03-14 09:58:24 : Creating diamond database 4 of 7
        2023-03-14 09:58:25 : Creating diamond database 5 of 7
        2023-03-14 09:58:25 : Creating diamond database 6 of 7
        2023-03-14 09:58:25 : Creating diamond database 7 of 7

        Running diamond all-versus-all
        ------------------------------
        Using 12 thread(s)
        2023-03-14 09:58:26 : This may take some time....
        2023-03-14 09:58:26 : Done 0 of 49
        2023-03-14 10:06:41 : Done 10 of 49
        2023-03-14 10:12:58 : Done 20 of 49
        2023-03-14 10:19:07 : Done 30 of 49
        2023-03-14 12:09:05 : Done all-versus-all sequence search

        Running OrthoFinder algorithm
        -----------------------------
        2023-03-14 12:09:08 : Initial processing of each species
        2023-03-14 12:09:47 : Initial processing of species 0 complete
        2023-03-14 12:10:20 : Initial processing of species 1 complete
        2023-03-14 12:10:55 : Initial processing of species 2 complete
        2023-03-14 12:11:50 : Initial processing of species 3 complete
        2023-03-14 12:12:23 : Initial processing of species 4 complete
        2023-03-14 12:13:02 : Initial processing of species 5 complete
        2023-03-14 12:13:35 : Initial processing of species 6 complete
        2023-03-14 12:14:25 : Connected putative homologues
        2023-03-14 12:14:33 : Written final scores for species 0 to graph file
        2023-03-14 12:14:39 : Written final scores for species 1 to graph file
        2023-03-14 12:14:45 : Written final scores for species 2 to graph file
        2023-03-14 12:14:53 : Written final scores for species 3 to graph file
        2023-03-14 12:14:58 : Written final scores for species 4 to graph file
        2023-03-14 12:15:03 : Written final scores for species 5 to graph file
        2023-03-14 12:15:08 : Written final scores for species 6 to graph file
        2023-03-14 12:16:18 : Ran MCL

        Writing orthogroups to file
        ---------------------------
        OrthoFinder assigned 238910 genes (93.9% of total) to 30437 orthogroups. Fifty percent of all genes were in orthogroups with 7 or more genes (G50 was 7) and were contained in the largest 8760 orthogroups (O50 was 8760). There were 14441 orthogroups with all species present and 8778 of these consisted entirely of single-copy genes.

        2023-03-14 12:18:53 : Done orthogroups

        Analysing Orthogroups
        =====================

        Calculating gene distances
        --------------------------
        2023-03-14 12:24:06 : Done
        2023-03-14 12:24:08 : Done 0 of 24144
        2023-03-14 12:24:29 : Done 1000 of 24144
        2023-03-14 12:24:36 : Done 2000 of 24144
        2023-03-14 12:24:41 : Done 3000 of 24144
        2023-03-14 12:24:46 : Done 4000 of 24144
        2023-03-14 12:24:51 : Done 5000 of 24144
        2023-03-14 12:24:56 : Done 6000 of 24144
        2023-03-14 12:25:01 : Done 7000 of 24144
        2023-03-14 12:25:05 : Done 8000 of 24144
        2023-03-14 12:25:09 : Done 9000 of 24144
        2023-03-14 12:25:14 : Done 10000 of 24144
        2023-03-14 12:25:19 : Done 11000 of 24144
        2023-03-14 12:25:24 : Done 12000 of 24144
        2023-03-14 12:25:31 : Done 13000 of 24144
        2023-03-14 12:25:36 : Done 14000 of 24144
        2023-03-14 12:25:42 : Done 15000 of 24144
        2023-03-14 12:25:47 : Done 16000 of 24144
        2023-03-14 12:25:53 : Done 17000 of 24144
        2023-03-14 12:25:58 : Done 18000 of 24144
        2023-03-14 12:26:03 : Done 19000 of 24144
        2023-03-14 12:26:08 : Done 20000 of 24144
        2023-03-14 12:26:12 : Done 21000 of 24144
        2023-03-14 12:26:18 : Done 22000 of 24144
        2023-03-14 12:26:23 : Done 23000 of 24144
        2023-03-14 12:26:28 : Done 24000 of 24144

        Inferring gene and species trees
        --------------------------------

        14441 trees had all species present and will be used by STAG to infer the species tree

        Best outgroup(s) for species tree
        ---------------------------------
        2023-03-14 12:49:27 : Starting STRIDE
        2023-03-14 12:50:13 : Done STRIDE
        Observed 258 well-supported, non-terminal duplications. 247 support the best root and 11 contradict it.
        Best outgroup for species tree:
          melongena

        Reconciling gene trees and species tree
        ---------------------------------------
        Outgroup: melongena
        2023-03-14 12:50:13 : Starting Recon and orthologues
        2023-03-14 12:50:13 : Starting OF Orthologues
        2023-03-14 12:50:14 : Done 0 of 24144
        2023-03-14 12:50:43 : Done 1000 of 24144
        2023-03-14 12:51:06 : Done 2000 of 24144
        2023-03-14 12:51:30 : Done 3000 of 24144
        2023-03-14 12:51:52 : Done 4000 of 24144
        2023-03-14 12:52:17 : Done 5000 of 24144
        2023-03-14 12:52:44 : Done 6000 of 24144
        2023-03-14 12:53:13 : Done 7000 of 24144
        2023-03-14 12:53:46 : Done 8000 of 24144
        2023-03-14 12:54:16 : Done 9000 of 24144
        2023-03-14 12:54:48 : Done 10000 of 24144
        2023-03-14 12:55:20 : Done 11000 of 24144
        2023-03-14 12:55:54 : Done 12000 of 24144
        2023-03-14 12:56:27 : Done 13000 of 24144
        2023-03-14 12:56:59 : Done 14000 of 24144
        2023-03-14 12:57:31 : Done 15000 of 24144
        2023-03-14 12:58:04 : Done 16000 of 24144
        2023-03-14 12:58:37 : Done 17000 of 24144
        2023-03-14 12:59:07 : Done 18000 of 24144
        2023-03-14 12:59:33 : Done 19000 of 24144
        2023-03-14 12:59:58 : Done 20000 of 24144
        2023-03-14 13:00:22 : Done 21000 of 24144
        2023-03-14 13:00:42 : Done 22000 of 24144
        2023-03-14 13:00:57 : Done 23000 of 24144
        2023-03-14 13:01:07 : Done 24000 of 24144
        2023-03-14 13:01:08 : Done OF Orthologues

        Writing results files
        =====================
        2023-03-14 13:01:22 : Done orthologues

        Results:
            /mnt/shared/scratch/msmith/solanum_pangenomics/orthofinder/Results_Mar14/

        CITATION:
         When publishing work that uses OrthoFinder please cite:
         Emms D.M. & Kelly S. (2019), Genome Biology 20:238

         If you use the species tree in your work then please also cite:
         Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
         Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914Error in if (is.na(gsParam$synteny$hogs)) { : argument is of length zero
> ls()
[1] "gpar"
jtlovell commented 1 year ago

I can't say I've ever seen this error ... what it is saying is that it cannot find /mnt/shared/scratch/msmith/solanum_pangenomics/orthofinder/Results_Mar14/Phylogenetic_Hierarchical_Orthogroups/N0.tsv ... the orthofinder run looks like it went OK, so my guess is this is some other issue. Can you just try to re-run your run_genespace call? It should give you a more informative error. If that doesn't help, lmk ASPA and I'll work on getting this fixed.

SwiftSeal commented 1 year ago

Error kept occurring for that genome so I ended up going for a fresh install and it's now working again. Perhaps something got broken when updating to the newer version, weird!

taprs commented 1 year ago

Hi and thanks for the nice tool!

I had the same issue with v1.1.4 -- when I decided to update the results with a new genome and added another the bed and the fasta files, I emptied the orthofinder/ folder and reran init_genespace() and run_genespace(), I got exactly the same output with an error in the end.

In my case, removing ALL of the previous run's output before running run_genespace() resulted in a successful run. I could even reuse the orthofinder results from the run with the error, its output was not affected.