jtlovell / GENESPACE

Other
189 stars 27 forks source link

error in synteny flagging at v1.0.4 #55

Closed TkNiw closed 1 year ago

TkNiw commented 1 year ago

Hello, I am facing a problem on v1.0.4 of genespace. When I ran gpar <- init_genespace(wd = wd, path2mcscanx = "../MCScanX/",nCores=8) gpar <- run_genespace(gsParam = gpar) using the test data, it returned

############################
1. Running orthofinder (or parsing existing results)
        Checking for existing orthofinder results ...
        ... found existing run, not re-running orthofinder

############################
2. Annotated/concatenated bed file exists

############################
3. Annotated/blast files exists

############################
4. Flagging synteny for each pair of genomes ...
        # Chunk 1 / 1 (02:15:34) ...
Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  Item 1 of input is not a data.frame, data.table or list
In addition: Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  all scheduled cores encountered errors in user code

Then, I changed nCores=8 to nCores=1, but it returned Error in if (median(hc$n) > 5) hc <- subset(hc, n > 1) : missing value where TRUE/FALSE needed at the same point

All the dependency on R packages and orthofinder were installed by conda, other than MCScanX. Do you have any idea to solve this? I really want to use this nice package!

jtlovell commented 1 year ago

Thanks for trying out v1!! This is a new error, almost certainly within GENESPACE and not something to do with your parameters. Would you mind sharing your run with me so I can troubleshoot? If so, send me an email: jlovell[at]hudsonalpha[dot]org.

jtlovell commented 1 year ago

OK - I think it should be resolved. This issue arose from having so few genes in the plot that the plotted couldn't round effectively. This has been fixed. The example you shared with me should run through now. Let me know if other issues arise.

The updates are now at v1.0.5. To get these, you need to detach GENESPACE and re-install:

detach("package:GENESPACE", unload = TRUE)
devtools::install_github("jtlovell/GENESPACE@dev", upgrade = F)
library(GENESPACE)
TkNiw commented 1 year ago

I appreciate your quick bug fix! But unfortunatelly, no change between v1.0.4 & 1.0.5. Same message appeared at the same point. I tried both test data and my data (containing 4 genome), both returned the same error.

Preciselly speaking, same error at nCore=8, but different error at nCore=1, like

############################
1. Running orthofinder (or parsing existing results)
        Checking for existing orthofinder results ...
        Copying files over to the temporary directory: .//tmp
        Running the following command in the shell: `orthofinder -f
                .//tmp -t 1 -a 1 -X -o .//orthofinder`.This can take a
                while. To check the progress, look in the
                `WorkingDirectory` in the output (-o) directory

        OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

        2022-12-14 15:02:10 : Starting OrthoFinder 2.5.4
        1 thread(s) for highly parallel tasks (BLAST searches etc.)
        1 thread(s) for OrthoFinder algorithm

        Checking required programs are installed
        ----------------------------------------
        Test can run "mcl -h" - ok
        Test can run "fastme -i .//orthofinder/Results_Dec14/WorkingDirectory/SimpleTest.phy -o .//orthofinder/Results_Dec14/WorkingDirectory/SimpleTest.tre" - ok

        Dividing up work for BLAST for parallel processing
        --------------------------------------------------
        2022-12-14 15:02:11 : Creating diamond database 1 of 3
        2022-12-14 15:02:11 : Creating diamond database 2 of 3
        2022-12-14 15:02:11 : Creating diamond database 3 of 3

        Running diamond all-versus-all
        ------------------------------
        Using 1 thread(s)
        2022-12-14 15:02:11 : This may take some time....
        2022-12-14 15:04:17 : Done all-versus-all sequence search

        Running OrthoFinder algorithm
        -----------------------------
        2022-12-14 15:04:17 : Initial processing of each species
        2022-12-14 15:04:17 : Initial processing of species 0 complete
        2022-12-14 15:04:17 : Initial processing of species 1 complete
        2022-12-14 15:04:17 : Initial processing of species 2 complete
        2022-12-14 15:04:19 : Connected putative homologues
        2022-12-14 15:04:20 : Written final scores for species 0 to graph file
        2022-12-14 15:04:20 : Written final scores for species 1 to graph file
        2022-12-14 15:04:20 : Written final scores for species 2 to graph file
        2022-12-14 15:04:20 : Ran MCL

        Writing orthogroups to file
        ---------------------------
        OrthoFinder assigned 5412 genes (96.3% of total) to 1747 orthogroups. Fifty percent of all genes were in orthogroups with 3 or more genes (G50 was 3) and were contained in the largest 856 orthogroups (O50 was 856). There were 1665 orthogroups with all species present and 1594 of these consisted entirely of single-copy genes.

        2022-12-14 15:04:20 : Done orthogroups

        Analysing Orthogroups
        =====================

        Calculating gene distances
        --------------------------
        2022-12-14 15:04:22 : Done
        2022-12-14 15:04:22 : Done 0 of 76
        2022-12-14 15:04:23 : Done 10 of 76
        2022-12-14 15:04:23 : Done 20 of 76
        2022-12-14 15:04:23 : Done 30 of 76
        2022-12-14 15:04:23 : Done 40 of 76
        2022-12-14 15:04:23 : Done 50 of 76
        2022-12-14 15:04:23 : Done 60 of 76
        2022-12-14 15:04:23 : Done 70 of 76

        Inferring gene and species trees
        --------------------------------
Best outgroup(s) for species tree
        ---------------------------------
        2022-12-14 15:04:24 : Starting STRIDE
        Traceback (most recent call last):
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/stride.py", line 506, in GetRoot
            speciesTree = tree.Tree(speciesTreeFN, format=2)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/tree.py", line 221, in __init__
            read_newick(newick, root_node = self, format=format)
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/newick.py", line 208, in read_newick
            nw = open(newick, 'rU').read()
                 ^^^^^^^^^^^^^^^^^^
        ValueError: invalid mode: 'rU'

        During handling of the above exception, another exception occurred:

        Traceback (most recent call last):
          File "/home/niwa/miniconda3/envs/genespace2/bin/orthofinder", line 7, in <module>
            main(args)
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/__main__.py", line 1778, in main
            GetOrthologues(speciesInfoObj, options, prog_caller)
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/__main__.py", line 1540, in GetOrthologues
            orthologues.OrthologuesWorkflow(speciesInfoObj.speciesToUse,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/orthologues.py", line 1039, in OrthologuesWorkflow
            roots, clusters_counter, rootedSpeciesTreeFN, nSupport, _, _, stride_dups = stride.GetRoot(spTreeFN_ids, files.FileHandler.GetOGsTreeDir(), stride.GeneToSpecies_dash, nHighParallel, qWriteRootedTree=True)
                                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/stride.py", line 509, in GetRoot
            speciesTree = tree.Tree(speciesTreeFN, format=1)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/tree.py", line 221, in __init__
            read_newick(newick, root_node = self, format=format)
          File "/home/niwa/miniconda3/envs/genespace2/bin/scripts_of/newick.py", line 208, in read_newick
            nw = open(newick, 'rU').read()
                 ^^^^^^^^^^^^^^^^^^
        ValueError: invalid mode: 'rU'
############################
2. Combining and annotating bed files w/ OGs and tandem array info ...
        ##############
        Flagging chrs. w/ < 10 unique orthogroups
        ...chimp : 0 genes on 0 small chrs.
        ...human : 0 genes on 0 small chrs.
        ...rhesus: 0 genes on 0 small chrs.
        ##############
        Flagging over-dispered OGs
        ...chimp : 0 genes in 0 OGs hit > 8 unique places
        ...human : 0 genes in 0 OGs hit > 8 unique places
        ...rhesus: 0 genes in 0 OGs hit > 8 unique places
        ##############
        Annotation summaries (after exclusions):
        ...chimp : 1927 genes in 1826 OGs || 109 genes in 39 arrays
        ...human : 1787 genes in 1709 OGs || 103 genes in 33 arrays
        ...rhesus: 1906 genes in 1819 OGs || 103 genes in 41 arrays

############################
3. Combining and annotating the blast files with orthogroup info ...
 fwrite(x, file = filepath, showProgress = F, quote = F, sep = "\t") でエラー:
  Compression in fwrite uses zlib library. Its header files were not found at the time data.table was compiled. To enable fwrite compression, please reinstall data.table and study the output for further guidance.
 追加情報:  警告メッセージ:
 system2(path2orthofinder, ofComm, stdout = TRUE, stderr = TRUE) で:
   命令 ''orthofinder' -f .//tmp -t 1 -a 1 -X -o .//orthofinder 2>&1' の実行は状態 1 を持ちました

I am also wondering about "ValueError: invalid mode: 'rU'" in the first step of orthofinder run.

jtlovell commented 1 year ago

hmm. I was able to replicate the previous error, but not this one. Can you try to run it again from a new directory with just the /bed and /peptide subdirectories? Let me know if that doesn't help. The zlib error is usually indicative of a problem with the install environment.

TkNiw commented 1 year ago

I tried from downloading the example data for v1 (human & chicken one), but the function still returned error. In both cases below, "ValueError: invalid mode: 'rU'" was returned at step1(orthofinder run).

At nCores=8, after returning "invalid mode...", it went through the step 2 without error, then

 rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { でエラー:
  Item 1 of input is not a data.frame, data.table or list
 追加情報:  警告メッセージ:
1:  system2(path2orthofinder, ofComm, stdout = TRUE, stderr = TRUE) で:
   命令 ''orthofinder' -f .//tmp -t 8 -a 1 -X -o .//orthofinder 2>&1' の実行は状態 1 を持ちました
2:  mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { で:
  all scheduled cores encountered errors in user code` at step 3.

At nCores=1, just after returning "invalid mode..."

ValueError: invalid mode: 'rU' dir.exists(to) でエラー:  ファイル名引数が不正です
 追加情報:  警告メッセージ:
 system2(path2orthofinder, ofComm, stdout = TRUE, stderr = TRUE) で:
   命令 ''orthofinder' -f .//tmp -t 1 -a 1 -X -o .//orthofinder 2>&1' の実行は状態 1 を持ちました

This time, not reached to the point where the zlib error was returned. No idea for this, sorry.

jtlovell commented 1 year ago

can you confirm that a full orthofinder run was completed? you can run find_ofFiles(orthofinderDir = "path/to/orthofinderRun"). As long as there are no NAs, you have a complete orthofinder run. If thats the case, I'll have to revisit. But, looking at the source of the error, my guess is that your installation and/or conda environment is not set up correctly.

TkNiw commented 1 year ago

I actually succeeded to complete the run in newly prepared docker container. Before this, I installed all the dependencies (including devtools, BiocManager, Biostrings, rtracklayer) via conda command, since these R packages were not able to be installed via install,packages() on R. This time, I noticed the error messages from install.packages mentioned about the lack of some packages on ubuntu. Therefore, I prepared a docker container from simple ubuntu image, and installed all the dependencies mentioned in the error messages via apt install, then installed four dependent R packages, it was successful, and finally completed the run. Since the installation of R packages via conda did not output any errors, I thought no problem in installation, but maybe, there was in fact. Anyway, thank you for helping me!

I also checked find_ofFiles() against previous run with errors, and I guess the run has not completed.

> find_ofFiles(orthofinderDir = "./orthofinder/")
$SpeciesIDs
[1] "./orthofinder//Results_Dec15/WorkingDirectory/SpeciesIDs.txt"

$SequenceIDs
[1] "./orthofinder//Results_Dec15/WorkingDirectory/SequenceIDs.txt"

$ogs
[1] "./orthofinder//Results_Dec15/Orthogroups/Orthogroups.tsv"

$hogs
[1] "./orthofinder//Results_Dec15/Phylogenetic_Hierarchical_Orthogroups/N0.tsv"

$speciesTree
[1] NA

$blast
   genome1 genome2 genNum1 genNum2
1: chicken chicken       0       0
2: chicken   human       0       1
3:   human chicken       1       0
4:   human   human       1       1
                                                       blastFile orthologFile
1: ./orthofinder//Results_Dec15/WorkingDirectory/Blast0_0.txt.gz         <NA>
2: ./orthofinder//Results_Dec15/WorkingDirectory/Blast0_1.txt.gz         <NA>
3: ./orthofinder//Results_Dec15/WorkingDirectory/Blast1_0.txt.gz         <NA>
4: ./orthofinder//Results_Dec15/WorkingDirectory/Blast1_1.txt.gz         <NA>