jtlovell / GENESPACE

Other
180 stars 24 forks source link

Disk Quota #84

Closed ericgonzalezs closed 1 year ago

ericgonzalezs commented 1 year ago

Hello,

I am running GENESPACE with the example in the tutorial (https://htmlpreview.github.io/?https://github.com/jtlovell/tutorials/blob/main/genespaceGuide.html), just to be sure everything is working well. Running this: out <- run_genespace(gpar, overwrite = T) I have many errors, including files that can not be found and a "Disk quota exceeded" error. I have 4 Tb available, I am not sure what is happening.

Please find below the errors:

GENESPACE v1.1.4: synteny and orthology constrained comparative genomics

############################

  1. Running orthofinder (or parsing existing results) Checking for existing orthofinder results ... Copying files over to the temporary directory: /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/tmp Running the following command in the shell: orthofinder -f /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/tmp -t 16 -a 1 -X -o /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/orthofinder.This can take a while. To check the progress, look in the WorkingDirectory in the output (-o) directory

    OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms
    
    2023-04-10 03:57:24 : Starting OrthoFinder 2.5.4
    16 thread(s) for highly parallel tasks (BLAST searches etc.)
    1 thread(s) for OrthoFinder algorithm
    
    Checking required programs are installed
    ----------------------------------------
    Test can run "mcl -h" - ok
    Test can run "fastme -i /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/orthofinder/Results_Apr10/WorkingDirectory/SimpleTest.phy -o /home/egonza02/scratch/PANGENOME

    /GENESPACE/TEST/workingDirectory/orthofinder/Results_Apr10/WorkingDirectory/SimpleTest.tre" - ok

    Dividing up work for BLAST for parallel processing
    --------------------------------------------------
    2023-04-10 03:57:26 : Creating diamond database 1 of 5
    2023-04-10 03:57:26 : Creating diamond database 2 of 5
    2023-04-10 03:57:26 : Creating diamond database 3 of 5
    2023-04-10 03:57:26 : Creating diamond database 4 of 5
    2023-04-10 03:57:27 : Creating diamond database 5 of 5
    
    Running diamond all-versus-all
    ------------------------------
    Using 16 thread(s)
    2023-04-10 03:57:27 : This may take some time....
    2023-04-10 03:57:27 : Done 0 of 25
    2023-04-10 03:59:31 : Done 10 of 25
    2023-04-10 04:02:08 : Done all-versus-all sequence search
    
    Running OrthoFinder algorithm
    -----------------------------
    2023-04-10 04:02:10 : Initial processing of each species
    2023-04-10 04:02:25 : Initial processing of species 0 complete
    2023-04-10 04:02:43 : Initial processing of species 1 complete
    2023-04-10 04:03:04 : Initial processing of species 2 complete
    2023-04-10 04:03:20 : Initial processing of species 3 complete
    2023-04-10 04:03:39 : Initial processing of species 4 complete
    2023-04-10 04:03:50 : Connected putative homologues
    2023-04-10 04:03:53 : Written final scores for species 0 to graph file
    2023-04-10 04:03:56 : Written final scores for species 1 to graph file
    2023-04-10 04:04:00 : Written final scores for species 2 to graph file
    2023-04-10 04:04:03 : Written final scores for species 3 to graph file
    2023-04-10 04:04:07 : Written final scores for species 4 to graph file
    2023-04-10 04:04:18 : Ran MCL
    
    Writing orthogroups to file
    ---------------------------
    OrthoFinder assigned 96120 genes (95.9% of total) to 17888 orthogroups. Fifty percent of all genes were in orthogroups with 5 or more genes (G50 was 5) and were contained in the larg

    est 7086 orthogroups (O50 was 7086). There were 12447 orthogroups with all species present and 10450 of these consisted entirely of single-copy genes.

    2023-04-10 04:07:51 : Done orthogroups
    
    Analysing Orthogroups
    =====================
    
    Calculating gene distances
    --------------------------
    2023-04-10 04:10:16 : Done
    2023-04-10 04:10:18 : Done 0 of 15586
    2023-04-10 04:10:19 : Done 1000 of 15586
    2023-04-10 04:10:21 : Done 2000 of 15586
    2023-04-10 04:10:23 : Done 3000 of 15586
    2023-04-10 04:10:24 : Done 4000 of 15586
    2023-04-10 04:10:26 : Done 5000 of 15586
    2023-04-10 04:10:28 : Done 6000 of 15586
    2023-04-10 04:10:30 : Done 7000 of 15586
    2023-04-10 04:10:32 : Done 8000 of 15586
    2023-04-10 04:10:33 : Done 9000 of 15586
    2023-04-10 04:10:36 : Done 10000 of 15586
    2023-04-10 04:10:37 : Done 11000 of 15586
    2023-04-10 04:10:39 : Done 12000 of 15586
    2023-04-10 04:10:41 : Done 13000 of 15586
    2023-04-10 04:10:43 : Done 14000 of 15586
    2023-04-10 04:10:44 : Done 15000 of 15586
    
    Inferring gene and species trees
    --------------------------------
    
    ERROR: external program called by OrthoFinder returned an error code: 1
    
    Command: fastme -i /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/orthofinder/Results_Apr10/WorkingDirectory/Distances_SpeciesTree/OG0008246_tree_id.txt.dist.phylip

    -o /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/orthofinder/Results_Apr10/WorkingDirectory/SpeciesTrees_ids/OG0008246_tree_id.txt.tre -w O -s -n

    stdout
    ------
    
    stderr
    ------
    
     . Error: Cannot open file 'OG0008246_tree_id.txt.dist.phylip_fastme_stat.txt'
    
    Traceback (most recent call last):
      File "orthofinder.py", line 7, in <module>
      File "scripts_of/__main__.py", line 1778, in main
      File "scripts_of/__main__.py", line 1558, in GetOrthologues
      File "scripts_of/orthologues.py", line 999, in OrthologuesWorkflow
      File "scripts_of/orthologues.py", line 481, in RunAnalysis
      File "scripts_of/stag.py", line 262, in Run_ForOrthoFinder
      File "scripts_of/stag.py", line 221, in ProcessTrees
      File "scripts_of/stag.py", line 73, in WritePhylipMatrix
    IOError: [Errno 122] Disk quota exceeded: '/home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/orthofinder/Results_Apr10/WorkingDirectory/Distances_SpeciesTree/OG0006133

    _tree_id.txt.dist.phylip' [28624] Failed to execute script orthofinder ############################

  2. Combining and annotating bed files w/ OGs and tandem array info ... ############## Flagging chrs. w/ < 10 unique orthogroups ...chicken : 475 genes on 55 small chrs. ...human : 7 genes on 5 small chrs. ...mouse : 1 genes on 1 small chrs. ...platypus : 197 genes on 50 small chrs. ...sandLizard: 2 genes on 2 small chrs. ############## Flagging over-dispered OGs ...chicken : 222 genes in 7 OGs hit > 8 unique places ...human : 174 genes in 7 OGs hit > 8 unique places ...mouse : 163 genes in 5 OGs hit > 8 unique places ...platypus : 359 genes in 7 OGs hit > 8 unique places ...sandLizard: 628 genes in 11 OGs hit > 8 unique places ############## Annotation summaries (after exclusions): ...chicken : 17435 genes in 15057 OGs || 2247 genes in 456 arrays ...human : 20498 genes in 17175 OGs || 3219 genes in 885 arrays ...mouse : 22688 genes in 17463 OGs || 5268 genes in 1013 arrays ...platypus : 17801 genes in 15610 OGs || 1900 genes in 559 arrays ...sandLizard: 19734 genes in 16349 OGs || 2995 genes in 832 arrays

############################

  1. Combining and annotating the blast files with orthogroup info ...

    Chunk 1 / 1 (04:13:05 AM) ...

    ...mouse v. mouse:           total hits = 270258, same og = 84571
    ...sandLizard v. sandLizard: total hits = 227307, same og = 55643
    ...mouse v. sandLizard:      total hits = 290044, same og = 22061
    ...mouse v. human:           total hits = 288109, same og = 32002
    ...mouse v. platypus:        total hits = 274001, same og = 25149
    ...sandLizard v. human:      total hits = 261372, same og = 21484
    ...human v. human:           total hits = 219405, same og = 50257
    ...sandLizard v. platypus:   total hits = 249498, same og = 21485
    ...human v. platypus:        total hits = 238378, same og = 25383
    ...platypus v. platypus:     total hits = 187332, same og = 41557
    ...sandLizard v. chicken:    total hits = 231453, same og = 19209
    ...mouse v. chicken:         total hits = 247623, same og = 18255
    ...chicken v. chicken:       total hits = 189850, same og = 58687
    ...human v. chicken:         total hits = 223038, same og = 18605
    ...platypus v. chicken:      total hits = 205695, same og = 18051
    ##############
    Generating dotplots for all hits ... Done!

############################

  1. Flagging synteny for each pair of genomes ...

    Chunk 1 / 1 (04:13:39 AM) ...

    cat: /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/tmp/tmp_QIYPpzqCrgematJexwDp/mcs.collinearity: No such file or directory cat: /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/tmp/tmp_BsXKrkJrPUveZHAzNfYH/mcs.collinearity: No such file or directory cat: /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/tmp/tmp_voMenXfVkzeIMVtMZcLd/mcs.collinearity: No such file or directory Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { : Item 1 of input is not a data.frame, data.table or list Calls: run_genespace -> synteny -> rbindlist -> lapply -> FUN -> rbindlist In addition: Warning messages: 1: In system2(path2orthofinder, ofComm, stdout = TRUE, stderr = TRUE) : running command ''orthofinder' -f /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/tmp -t 16 -a 1 -X -o /home/egonza02/scratch/PANGENOME/GENESPACE/TEST/workingDirectory/ort hofinder 2>&1' had status 255 2: In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { : scheduled cores 8, 10, 7, 9, 12, 5, 6, 2, 3, 1, 4, 13 encountered errors in user code, all values of the jobs will be affected Execution halted

I am wondering if you could please help me to solve these issues. Many thanks.

jtlovell commented 1 year ago

There is something wrong with either the input data, the system, or the orthofinder install. The orthofinder run failed. The quota issue, likely is disc space usage. Also, I would recommend updating to the current version on github.

ericgonzalezs commented 1 year ago

Many thanks. Do you know how much disc space we need to run the Genespace example?

jtlovell commented 1 year ago

A Gb or so ... you can always just run 2 or three of the genomes.

ericgonzalezs commented 1 year ago

Many thanks,

The problem with the space was actually the number of files. I have a 1M file limit, I was close to the limit and the multiple files from GeneSpace reached that limit. I also installed again GeneSpace. Because I can't use conda on the cluster where I am working, what I did was to add to the path the OrthoFinder dir, the OrthoFinder/bin dir, and the OrthoFinder/tools dir. After that, I installed GeneSpace following your instructions. Now everything is running well. Many thanks for the answers.

jtlovell commented 1 year ago

Glad you figured it out!