Error when running init_genespace() on phytozome genomes

samseaver commented 7 months ago

Hi, so I've got a set of Phytozome genomes on which I was able to run parse_annotations() cleanly, and so I get the corresponding bed and peptide fasta files. But when I run init_genespace, though it recognizes all the genomes, it errors out, so here's the output, and the result of the traceback() function. Unfortunately, I'm not familiar with R or the code, so I'm not sure how to follow this:

> gpar <- init_genespace(wd="./trial_run_2",path2mcscanx="./MCScanX")
Checking Working Directory ... PASS: `./trial_run_2`
Checking user-defined parameters ...
        Genome IDs & ploidy ...
                Athaliana_Araport11           : 1
                Atrichopoda_v1.0              : 1
                Bdistachyon_v3.1              : 1
                Brapassp_trilocularisR500_v2.1: 1
                Creinhardtii_v5.6             : 1
                CreinhardtiiCC_v6.1           : 1
                Gmax_Wm82.a4                  : 1
                Mdomestica_v1.1               : 1
                Osativa_v7.0                  : 1
                Ptrichocarpa_v4.1             : 1
                Sbicolor_v3.1.1               : 1
                Sbicolor_v5.1                 : 1
                Slycopersicum_ITAG4.0         : 1
                Smoellendorffii_v1.0          : 1
                Spolyrhiza_v2                 : 1
                Tarvense_NCBI                 : 1
                Tarvense_v1.1                 : 1
                Zmays_B73_REFERENCE           : 1
                Zmays_RefGen_V4               : 1
        Outgroup ... NONE
        n. parallel processes ... 16
        collinear block size ... 5
        collinear block search radius ... 25
        n gaps in collinear block ... 5
        synteny buffer size... 100
        only orthogroups hits as anchors ... TRUE
        n secondary hits ... 0
Checking annotation files (.bed and peptide .fa):
Error: subscript contains out-of-bounds indices
> traceback()
21: stop(wmsg(...), call. = FALSE)
20: .subscript_error("subscript contains out-of-bounds indices")
19: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append,
        allow.NAs = allow.NAs)
18: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append,
        allow.NAs = allow.NAs)
17: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
16: .nextMethod(x = x, i = i)
15: callNextMethod()
14: .dropUnusedPoolElts(callNextMethod())
13: extractROWS(x, i)
12: extractROWS(x, i)
11: subset_along_ROWS(x, i, drop = drop)
10: fa[1:nfa]
9: fa[1:nfa]
8: paste(fa[1:nfa], collapse = "")
7: is.factor(x)
6: gsub("[^A-Za-z]", "", paste(fa[1:nfa], collapse = ""))
5: FUN(X[[i]], ...)
4: lapply(X = X, FUN = FUN, ...)
3: sapply(pepFiles, check_onlyDNA)
2: check_annotFiles(filepath = wd, genomeIDs = gids)
1: init_genespace(wd = "./trial_run_2", path2mcscanx = "./MCScanX")

I'm using version v1.3.1 as installed via devtools:


GENESPACE v1.3.1: synteny and orthology constrained comparative
        genomics```

Fyasmin05 commented 5 months ago

Hi, I am having similar issue. I was able to parse the files into bed and peptide but getting following error. I would appreciate help on this. Thanks!

Checking Working Directory ... PASS: /Genespace/workingdir Checking user-defined parameters ... Genome IDs & ploidy ... Sitalica_v2.2: 1 Sviridis_v4.1: 1 Taestivum : 1 Zmays : 1 Outgroup ... NONE n. parallel processes ... 6 collinear block size ... 5 collinear block search radius ... 25 n gaps in collinear block ... 5 synteny buffer size... 100 only orthogroups hits as anchors ... TRUE n secondary hits ... 0 Checking annotation files (.bed and peptide .fa): Error: subscript contains out-of-bounds indices

samseaver commented 5 months ago

Hello, I heard from Dr Lovell independently of this posted issue, the problem was that the identifiers in the two bed and fasta files didn't match. In my case, I had to make sure I used the "translated_cds" file from the NCBI.

LovellHAGSC commented 5 months ago

Yeah. This is what happens when you give GENESPACE empty input files. Check the output of parse_annotations ... I bet something went wrong.

Fyasmin05 commented 5 months ago

Hello again, Thank you for your help with this. I used gene.gff3 and protein.fa file. After parsing the bed file and peptide file GeneID matches. For example, I have attached a bed and peptide file to this comment. However, I am still getting an error saying the following. Could you please let me know how to fix this?

Screen Shot 2024-03-14 at 9 43 25 AM

Checking user-defined parameters ... Genome IDs & ploidy ... Sitalica : 1 Sviridis : 1 Taestivum: 1 Zmays : 1 Outgroup ... NONE n. parallel processes ... 6 collinear block size ... 5 collinear block search radius ... 25 n gaps in collinear block ... 5 synteny buffer size... 100 only orthogroups hits as anchors ... TRUE n secondary hits ... 0 Checking annotation files (.bed and peptide .fa): Sitalica : 34584 / 34584 geneIDs exactly match (PASS) Sviridis : 29807 / 29807 geneIDs exactly match (PASS) Taestivum: 99386 / 99386 geneIDs exactly match (PASS) Zmays : 39756 / 39756 geneIDs exactly match (PASS) Checking dependencies ... Found valid path to OrthoFinder v2.55: orthofinder Found valid path to DIAMOND2 v2.19: diamond Found valid MCScanX_h executable: /Users/ ![Screen Shot 2024-03-14 at 9 43 25 AM](https://github.com/jtlovell/GENESPACE/assets/59696208/d1f5485e-261a-4cb9-b427-f3c20ce7d2c8) ![Screen Shot 2024-03-14 at 9 43 46 AM](https://github.com/jtlovell/GENESPACE/assets/59696208/ca792d0f-54fa-4163-8661-c6d37dee6cbc) Downloads/MCScanX-master/MCScanX_h

out <- run_genespace(gpar, overwrite = T)

############################

Running orthofinder (or parsing existing results) Checking for existing orthofinder results ... [1] FALSE Error in run_genespace(gpar, overwrite = T) : genomes in the existing orthofinder run do not exactly match specified genomeIDs

jtlovell commented 5 months ago

Try removing the /orthofinder subdirectory (in your genespace working directory) and re-running.

jtlovell / GENESPACE

Error when running init_genespace() on phytozome genomes #138