Error constructing pan-gene sets with sample data

jdobry-lab commented 1 year ago

Hi John,

I have everything working fine until the pan-gene sets. I got the following errors with the sample data.

Error in setnames(ogs, c("pgRepID", "ofID")) : Can't assign 2 names to a 0 column data.table In addition: Warning message: In system2(path2orthofinder, ofComm, stdout = TRUE, stderr = TRUE) : running command ''orthofinder' -f /test/tmp -t 4 -a 1 -X -o /test/orthofinder 2>&1' had status 1

jtlovell commented 1 year ago

what version of genespace is this? it looks like maybe orthofinder didn't run correctly (did you get a full run)?

jdobry-lab commented 1 year ago

Here is the output: The only thing I noticed was under Reconciling gene trees and species tree there is a ValueError: invalid mode: 'rU'

Checking dependencies ... Found valid path to OrthoFinder v2.54: orthofinder Found valid path to DIAMOND2 v2.16: diamond Found valid MCScanX_h executable: /Users/jasondobry/MCScanX-master/MCScanX_h

2023-04-03 14:10:12 : Starting OrthoFinder 2.5.4
4 thread(s) for highly parallel tasks (BLAST searches etc.)
1 thread(s) for OrthoFinder algorithm

Checking required programs are installed
----------------------------------------
Test can run "mcl -h" - ok
Test can run "fastme -i /test/orthofinder/Results_Apr03/WorkingDirectory/SimpleTest.phy -o /test/orthofinder/Results_Apr03/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing
--------------------------------------------------
2023-04-03 14:10:15 : Creating diamond database 1 of 2
2023-04-03 14:10:16 : Creating diamond database 2 of 2

Running diamond all-versus-all
------------------------------
Using 4 thread(s)
2023-04-03 14:10:16 : This may take some time....
2023-04-03 14:10:16 : Done 0 of 4
2023-04-03 14:15:12 : Done all-versus-all sequence search

Running OrthoFinder algorithm
-----------------------------
2023-04-03 14:15:12 : Initial processing of each species
2023-04-03 14:15:15 : Initial processing of species 0 complete
2023-04-03 14:15:18 : Initial processing of species 1 complete
2023-04-03 14:15:21 : Connected putative homologues
2023-04-03 14:15:22 : Written final scores for species 0 to graph file
2023-04-03 14:15:22 : Written final scores for species 1 to graph file
2023-04-03 14:15:27 : Ran MCL

Writing orthogroups to file
---------------------------
OrthoFinder assigned 35168 genes (90.8% of total) to 13626 orthogroups. Fifty percent of all genes were in orthogroups with 2 or more genes (G50 was 2) and were contained in the largest 5722 orthogroups (O50 was 5722). There were 13007 orthogroups with all species present and 10817 of these consisted entirely of single-copy genes.

2023-04-03 14:15:32 : Done orthogroups

Analysing Orthogroups
=====================

Calculating gene distances
--------------------------
2023-04-03 14:15:39 : Done
2023-04-03 14:15:40 : Done 0 of 1127
2023-04-03 14:15:41 : Done 100 of 1127
2023-04-03 14:15:41 : Done 200 of 1127
2023-04-03 14:15:42 : Done 300 of 1127
2023-04-03 14:15:42 : Done 400 of 1127
2023-04-03 14:15:43 : Done 500 of 1127
2023-04-03 14:15:44 : Done 600 of 1127
2023-04-03 14:15:44 : Done 700 of 1127
2023-04-03 14:15:45 : Done 800 of 1127
2023-04-03 14:15:46 : Done 900 of 1127
2023-04-03 14:15:46 : Done 1000 of 1127
2023-04-03 14:15:47 : Done 1100 of 1127

Inferring gene and species trees
--------------------------------

Reconciling gene trees and species tree
---------------------------------------
2023-04-03 14:15:48 : Starting Recon and orthologues
2023-04-03 14:15:48 : Starting OF Orthologues
Traceback (most recent call last):
  File "/anaconda3/envs/orthofinder/bin/orthofinder", line 7, in <module>
    main(args)
  File "/anaconda3/envs/orthofinder/bin/scripts_of/__main__.py", line 1778, in main
    GetOrthologues(speciesInfoObj, options, prog_caller)
  File "/anaconda3/envs/orthofinder/bin/scripts_of/__main__.py", line 1540, in GetOrthologues
    orthologues.OrthologuesWorkflow(speciesInfoObj.speciesToUse, 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/orthofinder/bin/scripts_of/orthologues.py", line 1090, in OrthologuesWorkflow
    ReconciliationAndOrthologues(recon_method, db.ogSet, nHighParallel, nLowParallel, i if qMultiple else None, stride_dups=stride_dups, q_split_para_clades=q_split_para_clades) 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/orthofinder/bin/scripts_of/orthologues.py", line 856, in ReconciliationAndOrthologues
    species_tree_rooted_labelled = tree.Tree(speciesTree_ids_fn)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/orthofinder/bin/scripts_of/tree.py", line 221, in __init__
    read_newick(newick, root_node = self, format=format)
  File "/anaconda3/envs/orthofinder/bin/scripts_of/newick.py", line 208, in read_newick
    nw = open(newick, 'rU').read()
         ^^^^^^^^^^^^^^^^^^
ValueError: invalid mode: 'rU'

############################

Combining and annotating bed files w/ OGs and tandem array info ... ############## Flagging chrs. w/ < 10 unique orthogroups ...chicken: 475 genes on 55 small chrs. ...human : 7 genes on 5 small chrs. ############## Flagging over-dispered OGs ...chicken: 224 genes in 7 OGs hit > 8 unique places ...human : 482 genes in 15 OGs hit > 8 unique places ############## Annotation summaries (after exclusions): ...chicken: 17433 genes in 14715 OGs || 2319 genes in 488 arrays ...human : 20139 genes in 15352 OGs || 3577 genes in 894 arrays

############################

Combining and annotating the blast files with orthogroup info ...
Chunk 1 / 1 (14:15:51) ...

...human v. human: total hits = 220600, same og = 67513 ...chicken v. chicken: total hits = 190677, same og = 59986 ...human v. chicken: total hits = 224855, same og = 18797 ############## Generating dotplots for all hits ... Done!

############################

Flagging synteny for each pair of genomes ...
Chunk 1 / 1 (14:16:06) ...

...human v. chicken: 14776 hits (11494 anchors) in 590 blocks (509 SVs, 353 regions) ...human v. human: 58372 hits (20571 anchors) in 30 blocks (0 SVs, 0 regions) ...chicken v. chicken: 56904 hits (18084 anchors) in 96 blocks (0 SVs, 0 regions)

############################

Building synteny-constrained orthogroups ... Done!

############################

Integrating syntenic positions across genomes ... ############## Generating syntenic dotplots ... Done! ############## Interpolating syntenic positions of genes ... chicken: (0 / 1 / 2 / >2 syntenic positions) chicken: 0 / 18089 / 0 / 0 human : 3169 / 14172 / 115 / 0 human: (0 / 1 / 2 / >2 syntenic positions) chicken: 1853 / 13568 / 181 / 0 human : 0 / 20627 / 0 / 0 Done!

############################

Final block coordinate calculation and riparian plotting ... ############## Calculating syntenic blocks by reference chromosomes ... n regions (aggregated by 25 gene radius): 782 n blocks (collinear sets of > 5 genes): 1136 ############## Building ref.-phased blks and riparian plots for haploid genomes: chicken: 843 phased blocks human : 840 phased blocks Done!

############################

Constructing syntenic pan-gene sets ... chicken: Error in setnames(ogs, c("pgRepID", "ofID")) : Can't assign 2 names to a 0 column data.table In addition: Warning message: In system2(path2orthofinder, ofComm, stdout = TRUE, stderr = TRUE) : running command ''orthofinder' -f /test/tmp -t 4 -a 1 -X -o /test/orthofinder 2>&1' had status 1

jdobry-lab commented 1 year ago

Genespace version 1.1.4

jdobry-lab commented 1 year ago

Ok got it with version 1.1.8, but am getting the same error reported by another user

Error in match_fasta2gff(path2fasta = fa, path2gff = gf, genespaceWd = genespaceWd, : some of the peptides have '.' or '-' in the sequence. Orthofinder can't handle this.

jtlovell commented 1 year ago

Yes - the v1.1.8 parse_annotations bug will be fixed in the next release. Today probably. But you don't need to re-run parse_annotations ... just use your existing wd with init_genespace and it should run through no prob.

The bug you reported for v1.1.4 is also known (#77) and caused by the orthologs step of orthofinder failing (which is why you get that traceback error in orthofinder). I have no idea why orthofinder fails there, but it sometimes does happen. GENESPACE v1.1.7+ can handle an incomplete orthofinder run.

I'll close this once v1.1.9 is posted with the parse_annotations bug fix.

jtlovell commented 1 year ago

The updated package is now at master. Let me know if it works for you. Update via:

detach("package:GENESPACE", unload = TRUE)
devtools::install_github("jtlovell/GENESPACE", upgrade = F)
library(GENESPACE)

jtlovell commented 1 year ago

v1.1.10 is pushed to master and built as the latest release. I'm gonna close this issue, since the new release should address it. If this isn't the case, please re-open. Thanks!

jdobry-lab commented 1 year ago

Thank you I haven't had a chance to test it yet. If I have further issue I will let you know. Cheers!

jtlovell / GENESPACE

Error constructing pan-gene sets with sample data #80

Chunk 1 / 1 (14:15:51) ...

Chunk 1 / 1 (14:16:06) ...