jtlovell / GENESPACE

Other
191 stars 27 forks source link

Query Regarding Rerun of OrthoFinder Despite Previous Result Directory Provided #135

Closed Caoyu819 closed 10 months ago

Caoyu819 commented 10 months ago

Dear GENESPACE Team,

I hope this message finds you well. I wanted to express my appreciation for developing such a valuable and powerful tool for comparative genomics analysis. I have recently been using GENESPACE to generate synteny plots for 35 plant genomes.

I have a question regarding the usage of GENESPACE. While attempting to run the init_genespace and run_genespace programs, I specified the path of the OrthoFinder results using the parameter rawOrthofinderDir, as I had previously completed the OrthoFinder analysis. However, I noticed that the program still initiated the entire OrthoFinder analysis from the diamond step. This process is time-consuming for the 35 genomes and requires significant computational resources.

To streamline my analysis, I have prepared the annotation and peptide files using customized scripts, and they appear to be in good shape for running the run_genespace program. However, I am unsure why the previous OrthoFinder results were not utilized in my run. Below is the command I used:

`library(GENESPACE)

Set the paths for my run.

wd <- "/ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out" path2mcscanx <- "/software/shared/apps/x86_64/MCScanX/20141028"

Prepare the input: bed and pep.fa

Conduct the GENESPACE run … depending on your machine, this can take a few minutes up to an hour.

gpar <- init_genespace( wd = wd, ploidy = 1, path2mcscanx = path2mcscanx, nCores = 8, blkSize = 5, nGaps = 25, rawOrthofinderDir = "/ngsprojects/chrevo_proj/results/0_genome_comparasion/2_orthofinder/orthofinder_32fagales_3out/OrthoFinder/Results_Jun30")

out <- run_genespace(gpar, overwrite = T)`

The structure of rawOrthofinderDir("/ngsprojects/chrevo_proj/results/0_genome_comparasion/2_orthofinder/orthofinder_32fagales_3out/OrthoFinder/Results_Jun30") is: Citation.txt Comparative_Genomics_Statistics Gene_Duplication_Events Gene_Trees Log.txt Orthogroups Orthogroup_Sequences Orthologues Phylogenetically_Misplaced_Genes Phylogenetic_Hierarchical_Orthogroups Putative_Xenologs Resolved_Gene_Trees Single_Copy_Orthologue_Sequences Species_Tree WorkingDirectory

And the log file I got currently is: Checking Working Directory ... PASS: /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out Checking user-defined parameters ... Genome IDs & ploidy ... Arox: 1 Asin: 1 Bpen: 1 Bpla: 1 Came: 1 Ccat: 1 Ccun: 1 Cden: 1 Cequ: 1 Cgla: 1 Cill: 1 Cman: 1 Cmol: 1 Cpal: 1 Ctib: 1 Cvim: 1 Espi: 1 Jcin: 1 Jman: 1 Jreg: 1 Odav: 1 Oreh: 1 Plon: 1 Pmum: 1 Pste: 1 Pstr: 1 Qacu: 1 Qgil: 1 Qlob: 1 Qmon: 1 Qrub: 1 Qvar: 1 Rchi: 1 Tcac: 1 Vvin: 1 Outgroup ... NONE n. parallel processes ... 8 collinear block size ... 5 collinear block search radius ... 25 n gaps in collinear block ... 25 synteny buffer size... 100 only orthogroups hits as anchors ... TRUE n secondary hits ... 0 Checking annotation files (.bed and peptide .fa): Arox: 30054 / 30054 geneIDs exactly match (PASS) Asin: 35369 / 35369 geneIDs exactly match (PASS) Bpen: 22540 / 22540 geneIDs exactly match (PASS) Bpla: 32868 / 32868 geneIDs exactly match (PASS) Came: 22933 / 22933 geneIDs exactly match (PASS) Ccat: 36721 / 36721 geneIDs exactly match (PASS) Ccun: 23178 / 23178 geneIDs exactly match (PASS) Cden: 29878 / 29878 geneIDs exactly match (PASS) Cequ: 21811 / 21811 geneIDs exactly match (PASS) Cgla: 23277 / 23277 geneIDs exactly match (PASS) Cill: 30771 / 30771 geneIDs exactly match (PASS) Cman: 28139 / 28139 geneIDs exactly match (PASS) Cmol: 32012 / 32012 geneIDs exactly match (PASS) Cpal: 28222 / 28222 geneIDs exactly match (PASS) Ctib: 40652 / 40652 geneIDs exactly match (PASS) Cvim: 26070 / 26070 geneIDs exactly match (PASS) Espi: 29652 / 29652 geneIDs exactly match (PASS) Jcin: 29526 / 29526 geneIDs exactly match (PASS) Jman: 37976 / 37976 geneIDs exactly match (PASS) Jreg: 30695 / 30695 geneIDs exactly match (PASS) Odav: 24637 / 24637 geneIDs exactly match (PASS) Oreh: 25775 / 25775 geneIDs exactly match (PASS) Plon: 29138 / 29138 geneIDs exactly match (PASS) Pmum: 21344 / 21344 geneIDs exactly match (PASS) Pste: 32116 / 32116 geneIDs exactly match (PASS) Pstr: 28875 / 28875 geneIDs exactly match (PASS) Qacu: 30943 / 30943 geneIDs exactly match (PASS) Qgil: 29912 / 29912 geneIDs exactly match (PASS) Qlob: 39373 / 39373 geneIDs exactly match (PASS) Qmon: 36553 / 36553 geneIDs exactly match (PASS) Qrub: 32091 / 32091 geneIDs exactly match (PASS) Qvar: 29105 / 29105 geneIDs exactly match (PASS) Rchi: 32346 / 32346 geneIDs exactly match (PASS) Tcac: 26271 / 26271 geneIDs exactly match (PASS) Vvin: 25086 / 25086 geneIDs exactly match (PASS) Checking dependencies ... Found valid path to OrthoFinder v2.54:orthofinder Found valid path to DIAMOND2 v2.05:diamond Found valid MCScanX_h executable:/software/shared/apps/x86_64/MCScanX/20141028/MCScanX_h`

############################

  1. Running orthofinder (or parsing existing results) Checking for existing orthofinder results ... [1] TRUE Copying files over to the temporary directory: /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out/tmp Running the following command in the shell: orthofinder -f /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out/tmp -t 8 -a 1 -X -o /ngsprojects/chrevo_proj/results/1_syteny_analysis/geneSPACE/fagales_32genomes_3out/orthofinder.This can take a while. To check the progress, look in the WorkingDirectory in the output (-o) directory `

Is there an issue with my parameter settings, or are there any important tips I may have overlooked before running the program? Your guidance and suggestions would be highly appreciated.

Thank you for your time and support.

Best regards, Yu

jtlovell commented 10 months ago

Sorry about the delay responding ... The issue here is that the genomes do not exactly match the names of those genomes in the orthofinder run. I added that [1] TRUE printout to show this for troubleshooting.