jtlovell / GENESPACE

Other
180 stars 24 forks source link

Query Regarding Rerun of OrthoFinder Despite Previous Result Directory Provided #135

Closed Caoyu819 closed 7 months ago

Caoyu819 commented 8 months ago

Dear GENESPACE Team,

I hope this message finds you well. I wanted to express my appreciation for developing such a valuable and powerful tool for comparative genomics analysis. I have recently been using GENESPACE to generate synteny plots for 35 plant genomes.

I have a question regarding the usage of GENESPACE. While attempting to run the init_genespace and run_genespace programs, I specified the path of the OrthoFinder results using the parameter rawOrthofinderDir, as I had previously completed the OrthoFinder analysis. However, I noticed that the program still initiated the entire OrthoFinder analysis from the diamond step. This process is time-consuming for the 35 genomes and requires significant computational resources.

To streamline my analysis, I have prepared the annotation and peptide files using customized scripts, and they appear to be in good shape for running the run_genespace program. However, I am unsure why the previous OrthoFinder results were not utilized in my run. Below is the command I used:

`library(GENESPACE)

Set the paths for my run.

wd <- "/ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out" path2mcscanx <- "/software/shared/apps/x86_64/MCScanX/20141028"

Prepare the input: bed and pep.fa

Conduct the GENESPACE run … depending on your machine, this can take a few minutes up to an hour.

gpar <- init_genespace( wd = wd, ploidy = 1, path2mcscanx = path2mcscanx, nCores = 8, blkSize = 5, nGaps = 25, rawOrthofinderDir = "/ngsprojects/chrevo_proj/results/0_genome_comparasion/2_orthofinder/orthofinder_32fagales_3out/OrthoFinder/Results_Jun30")

out <- run_genespace(gpar, overwrite = T)`

The structure of rawOrthofinderDir("/ngsprojects/chrevo_proj/results/0_genome_comparasion/2_orthofinder/orthofinder_32fagales_3out/OrthoFinder/Results_Jun30") is: Citation.txt Comparative_Genomics_Statistics Gene_Duplication_Events Gene_Trees Log.txt Orthogroups Orthogroup_Sequences Orthologues Phylogenetically_Misplaced_Genes Phylogenetic_Hierarchical_Orthogroups Putative_Xenologs Resolved_Gene_Trees Single_Copy_Orthologue_Sequences Species_Tree WorkingDirectory

And the log file I got currently is: Checking Working Directory ... PASS: /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out Checking user-defined parameters ... Genome IDs & ploidy ... Arox: 1 Asin: 1 Bpen: 1 Bpla: 1 Came: 1 Ccat: 1 Ccun: 1 Cden: 1 Cequ: 1 Cgla: 1 Cill: 1 Cman: 1 Cmol: 1 Cpal: 1 Ctib: 1 Cvim: 1 Espi: 1 Jcin: 1 Jman: 1 Jreg: 1 Odav: 1 Oreh: 1 Plon: 1 Pmum: 1 Pste: 1 Pstr: 1 Qacu: 1 Qgil: 1 Qlob: 1 Qmon: 1 Qrub: 1 Qvar: 1 Rchi: 1 Tcac: 1 Vvin: 1 Outgroup ... NONE n. parallel processes ... 8 collinear block size ... 5 collinear block search radius ... 25 n gaps in collinear block ... 25 synteny buffer size... 100 only orthogroups hits as anchors ... TRUE n secondary hits ... 0 Checking annotation files (.bed and peptide .fa): Arox: 30054 / 30054 geneIDs exactly match (PASS) Asin: 35369 / 35369 geneIDs exactly match (PASS) Bpen: 22540 / 22540 geneIDs exactly match (PASS) Bpla: 32868 / 32868 geneIDs exactly match (PASS) Came: 22933 / 22933 geneIDs exactly match (PASS) Ccat: 36721 / 36721 geneIDs exactly match (PASS) Ccun: 23178 / 23178 geneIDs exactly match (PASS) Cden: 29878 / 29878 geneIDs exactly match (PASS) Cequ: 21811 / 21811 geneIDs exactly match (PASS) Cgla: 23277 / 23277 geneIDs exactly match (PASS) Cill: 30771 / 30771 geneIDs exactly match (PASS) Cman: 28139 / 28139 geneIDs exactly match (PASS) Cmol: 32012 / 32012 geneIDs exactly match (PASS) Cpal: 28222 / 28222 geneIDs exactly match (PASS) Ctib: 40652 / 40652 geneIDs exactly match (PASS) Cvim: 26070 / 26070 geneIDs exactly match (PASS) Espi: 29652 / 29652 geneIDs exactly match (PASS) Jcin: 29526 / 29526 geneIDs exactly match (PASS) Jman: 37976 / 37976 geneIDs exactly match (PASS) Jreg: 30695 / 30695 geneIDs exactly match (PASS) Odav: 24637 / 24637 geneIDs exactly match (PASS) Oreh: 25775 / 25775 geneIDs exactly match (PASS) Plon: 29138 / 29138 geneIDs exactly match (PASS) Pmum: 21344 / 21344 geneIDs exactly match (PASS) Pste: 32116 / 32116 geneIDs exactly match (PASS) Pstr: 28875 / 28875 geneIDs exactly match (PASS) Qacu: 30943 / 30943 geneIDs exactly match (PASS) Qgil: 29912 / 29912 geneIDs exactly match (PASS) Qlob: 39373 / 39373 geneIDs exactly match (PASS) Qmon: 36553 / 36553 geneIDs exactly match (PASS) Qrub: 32091 / 32091 geneIDs exactly match (PASS) Qvar: 29105 / 29105 geneIDs exactly match (PASS) Rchi: 32346 / 32346 geneIDs exactly match (PASS) Tcac: 26271 / 26271 geneIDs exactly match (PASS) Vvin: 25086 / 25086 geneIDs exactly match (PASS) Checking dependencies ... Found valid path to OrthoFinder v2.54:orthofinder Found valid path to DIAMOND2 v2.05:diamond Found valid MCScanX_h executable:/software/shared/apps/x86_64/MCScanX/20141028/MCScanX_h`

############################

  1. Running orthofinder (or parsing existing results) Checking for existing orthofinder results ... [1] TRUE Copying files over to the temporary directory: /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out/tmp Running the following command in the shell: orthofinder -f /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out/tmp -t 8 -a 1 -X -o /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out/orthofinder.This can take a while. To check the progress, look in the WorkingDirectory in the output (-o) directory `

Is there an issue with my parameter settings, or are there any important tips I may have overlooked before running the program? Your guidance and suggestions would be highly appreciated.

Thank you for your time and support.

Best regards, Yu

jtlovell commented 7 months ago

Sorry about the delay responding ... The issue here is that the genomes do not exactly match the names of those genomes in the orthofinder run. I added that [1] TRUE printout to show this for troubleshooting.