jtlovell / GENESPACE

180 stars 24 forks source link

Orthofinder taking very long time #142

Closed tallnuttrbgv closed 6 months ago

tallnuttrbgv commented 6 months ago

I have a triploid genome in which I am attempting to identify homoeologs. Each is 1.5 Gbp with 325 K 'genes' in bed file. I am trying ploidy=1 first, then ploidy=3.

Orthofinder has been running for over 24 hours on 48 cpus (although looks like it is only using 8).

Any idea why it is frozen?


Orthofinder log does not give a clue:

2024-02-20 11:57:56 : Started OrthoFinder version 2.5.4
Command Line: orthofinder -f /g/data/dy44/r12.8_dampiera/gs//tmp -t 48 -a 1 -X -o /g/data/dy44/r12.8_dampiera/gs//orthofinder

WorkingDirectory_Base: /g/data/dy44/r12.8_dampiera/gs//orthofinder/Results_Feb20/WorkingDirectory/

Species used: 
0: purged.fa
1: purged2.fa

My genespace script:

#!/usr/bin/env Rscript


args = commandArgs(trailingOnly=TRUE)

wd <-args[1]

path2mcscanx <- "/g/data/nm31/bin/MCScanX/"

genomes2run <- unlist(strsplit(args[2],","))


outfile <- args[3]


threads <- args[5]

gpar <- init_genespace(
  ploidy = 1,
  genomeIDs = genomes2run,
  wd = wd,
  nCores = threads,
  path2mcscanx = path2mcscanx)

out <- run_genespace(gpar, overwrite = T)

  genomeIDs = genomes2run,
  gsParam = gsParam,
  braidAlpha = .75,
  refGenome = refGenome,
  chrLabFontSize = 1,
  minChrLen2plot = 0,
  pdfFile = outfile,
  useRegions = FALSE)
tallnuttrbgv commented 6 months ago

I think this is because my braker genes were not filtered and there were 10X too many (~300k instead of ~30k). I'll leave the issue here just in case it is helpful.