jtlovell / GENESPACE

Other
184 stars 24 forks source link

Memory usage error v1.1.4 #71

Closed peterinnes closed 1 year ago

peterinnes commented 1 year ago

Hi John,

The v1.1.4 release looks really great!! The run_genespace() function is working for me up until the final step, "Constructing syntenic pan-gene sets ..." at which point memory usage starts to climb and climb until it completely maxes out our server with 189Gb (nothing else running) and ends with the error 'cannot allocate vector of size 13.2 Gb'. The genomes are fairly small, 400–600Mb. I've copied the output from the run below, let me know if there's anything else that would help.

Thanks so much, Peter

############################

  1. Running orthofinder (or parsing existing results) Checking for existing orthofinder results ... Copying files over to the temporary directory: analysis/GENESPACE_v1/tmp Running the following command in the shell: orthofinder -f analysis/GENESPACE_v1/tmp -t 12 -a 1 -X -o analysis/GENESPACE_v1/orthofinder.This can take a while. To check the progress, look in the WorkingDirectory in the output (-o) directory

    OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

    2023-03-06 15:03:47 : Starting OrthoFinder 2.5.4 12 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm

    Checking required programs are installed

    Test can run "mcl -h" - ok Test can run "fastme -i analysis/GENESPACE_v1/orthofinder/Results_Mar06/WorkingDirectory/SimpleTest.phy -o analysis/GENESPACE_v1/orthofinder/Results_Mar06/WorkingDirectory/SimpleTest.tre" - ok

    Dividing up work for BLAST for parallel processing

    2023-03-06 15:03:48 : Creating diamond database 1 of 2 2023-03-06 15:03:48 : Creating diamond database 2 of 2

    Running diamond all-versus-all

    Using 12 thread(s) 2023-03-06 15:03:48 : This may take some time.... 2023-03-06 16:21:54 : Done all-versus-all sequence search

    Running OrthoFinder algorithm

    2023-03-06 16:21:55 : Initial processing of each species 2023-03-06 16:22:21 : Initial processing of species 0 complete 2023-03-06 16:22:46 : Initial processing of species 1 complete 2023-03-06 16:22:52 : Connected putative homologues 2023-03-06 16:22:57 : Written final scores for species 0 to graph file 2023-03-06 16:23:01 : Written final scores for species 1 to graph file 2023-03-06 16:23:26 : Ran MCL

    Writing orthogroups to file

    OrthoFinder assigned 72273 genes (90.5% of total) to 17585 orthogroups. Fifty percent of all genes were in orthogroups with 3 or more genes (G50 was 3) and were contained in the largest 4471 orthogroups (O50 was 4471). There were 13245 orthogroups with all species present and 4201 of these consisted entirely of single-copy genes.

    2023-03-06 16:23:30 : Done orthogroups

    Analysing Orthogroups

    Calculating gene distances

    2023-03-06 16:25:49 : Done 2023-03-06 16:25:50 : Done 0 of 3843 2023-03-06 16:26:06 : Done 1000 of 3843 2023-03-06 16:26:08 : Done 2000 of 3843 2023-03-06 16:26:11 : Done 3000 of 3843

    Inferring gene and species trees

    Reconciling gene trees and species tree

    2023-03-06 16:29:10 : Starting Recon and orthologues 2023-03-06 16:29:10 : Starting OF Orthologues 2023-03-06 16:29:11 : Done 0 of 3843 2023-03-06 16:29:22 : Done 1000 of 3843 2023-03-06 16:29:24 : Done 2000 of 3843 2023-03-06 16:29:26 : Done 3000 of 3843 2023-03-06 16:29:28 : Done OF Orthologues

    Writing results files

    2023-03-06 16:29:30 : Done orthologues

    Results: analysis/GENESPACE_v1/orthofinder/Results_Mar06/

    CITATION: When publishing work that uses OrthoFinder please cite: Emms D.M. & Kelly S. (2019), Genome Biology 20:238

    If you use the species tree in your work then please also cite: Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278 Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914 ############################

  2. Combining and annotating bed files w/ OGs and tandem array info ... ############## Flagging chrs. w/ < 10 unique orthogroups ...lewisii : 1274 genes on 590 small chrs. ...usitatissimum: 0 genes on 0 small chrs. ############## Flagging over-dispered OGs ...lewisii : 17725 genes in 181 OGs hit > 8 unique places ...usitatissimum: 1342 genes in 57 OGs hit > 8 unique places NOTE! Genomes flagged have > 5% of genes in over-dispersed orthogroups. These are likely not great annotations, or the synteny run contains un-specified WGDs. Regardless, these should be examined carefully ############## Annotation summaries (after exclusions): ...lewisii : 20539 genes in 16254 OGs || 2853 genes in 1058 arrays ...usitatissimum: 40948 genes in 24093 OGs || 4249 genes in 1919 arrays

############################

  1. Combining and annotating the blast files with orthogroup info ...

    Chunk 1 / 1 (04:29:43 PM) ...

    ...lewisii v. lewisii: total hits = 668171, same og = 418684 ...usitatissimum v. usitatissimum: total hits = 581504, same og = 111351 ...usitatissimum v. lewisii: total hits = 540855, same og = 30717 ############## Generating dotplots for all hits ... Done!

############################

  1. Flagging synteny for each pair of genomes ...

    Chunk 1 / 1 (04:30:14 PM) ...

    ...usitatissimum v. lewisii: 27459 hits (11207 anchors) in 844 blocks (749 SVs, 377 regions) ...lewisii v. lewisii: 61883 hits (37598 anchors) in 584 blocks (0 SVs, 0 regions) ...usitatissimum v. usitatissimum: 61443 hits (42285 anchors) in 15 blocks (0 SVs, 0 regions)

############################

  1. Building synteny-constrained orthogroups ... Done!

############################

  1. Integrating syntenic positions across genomes ... ############## Generating syntenic dotplots ... Done! ############## Interpolating syntenic positions of genes ... lewisii: (0 / 1 / 2 / >2 syntenic positions) lewisii : 0 / 37599 / 0 / 0 usitatissimum: 7729 / 30454 / 435 / 0 usitatissimum: (0 / 1 / 2 / >2 syntenic positions) lewisii : 1614 / 3910 / 11921 / 172 usitatissimum: 0 / 42290 / 0 / 0 Done!

############################

  1. Final block coordinate calculation and riparian plotting ... ############## Calculating syntenic blocks by reference chromosomcannot allocate vector es ... n regions (aggregated by 25 gene radius): 809 n blocks (collinear sets of > 5 genes): 1261 ############## Building ref.-phased blks and riparian plots for haploid genomes: lewisii : 904 phased blocks usitatissimum: 904 phased blocks Done!

############################

  1. Constructing syntenic pan-gene sets ... lewisii : Error: cannot allocate vector of size 13.2 Gb
jtlovell commented 1 year ago

oh wow ... there is something very wrong. Just two small genomes should be on the order of 40Mb of memory. Are these public genomes? If so, can you point me to the urls? If not, would you mind sharing the input /bed and /peptide directories? If the latter, shoot me an email to dm me on twitter and I'll share a transfer google drive link.

jtlovell commented 1 year ago

I pushed a patch to master that resolves this issue. Don't install from the release. I'll set up a new release tomorrow, but all the updated code is there now at master.