jtlovell / GENESPACE

Other
180 stars 24 forks source link

Orthofinder taking very long time #142

Closed tallnuttrbgv closed 6 months ago

tallnuttrbgv commented 6 months ago

I have a triploid genome in which I am attempting to identify homoeologs. Each is 1.5 Gbp with 325 K 'genes' in bed file. I am trying ploidy=1 first, then ploidy=3.

Orthofinder has been running for over 24 hours on 48 cpus (although looks like it is only using 8).

Any idea why it is frozen?

Thanks.

Orthofinder log does not give a clue:

2024-02-20 11:57:56 : Started OrthoFinder version 2.5.4
Command Line: orthofinder -f /g/data/dy44/r12.8_dampiera/gs//tmp -t 48 -a 1 -X -o /g/data/dy44/r12.8_dampiera/gs//orthofinder

WorkingDirectory_Base: /g/data/dy44/r12.8_dampiera/gs//orthofinder/Results_Feb20/WorkingDirectory/

Species used: 
0: purged.fa
1: purged2.fa

My genespace script:

#!/usr/bin/env Rscript

library(GENESPACE)

args = commandArgs(trailingOnly=TRUE)

wd <-args[1]

path2mcscanx <- "/g/data/nm31/bin/MCScanX/"

genomes2run <- unlist(strsplit(args[2],","))

print(genomes2run)

outfile <- args[3]

refGenome<-args[4]

threads <- args[5]

gpar <- init_genespace(
  ploidy = 1,
  genomeIDs = genomes2run,
  wd = wd,
  nCores = threads,
  blkSize=5,
  onlyOgAnchors=F,
  nSecondHits=10,
  path2mcscanx = path2mcscanx)

out <- run_genespace(gpar, overwrite = T)

plot_riparian(
  genomeIDs = genomes2run,
  gsParam = gsParam,
  braidAlpha = .75,
  refGenome = refGenome,
  chrLabFontSize = 1,
  minChrLen2plot = 0,
  pdfFile = outfile,
  useRegions = FALSE)
tallnuttrbgv commented 6 months ago

I think this is because my braker genes were not filtered and there were 10X too many (~300k instead of ~30k). I'll leave the issue here just in case it is helpful.