Error when using outgroup during pangenome step

niederhuth commented 2 years ago

Hi John,

Hope all is well.

I'm trying to run genespace on two mimulus genomes, using tomato as an outgroup. However, during the pangenome step I am getting this error as it tries to pull the non-syn orthologs:

GENESPACE run initialized: Initial orthofinder database generation method: default inside R Orthology graph method: inBlock Parsing annotation files ... L1 ... parsed annotations for L1 exist and !overwrite, skipping S1 ... parsed annotations for S1 exist and !overwrite, skipping Slycopersicum ... parsed annotations for Slycopersicum exist and !overwrite, skipping Synteny Parameters have not been set! Setting to defaults Warning message: In run_orthofinder(gsParam = gpar) : orthofinder run exists & !overwrite, so not running Building reference-anchored scaffold against L1 n. ref positions = 22744 Reading in hits against L1 ... found 39720 Interpolating positions ... n. genes mapped: 1x = 21293, 2+x = 238, 0x = 23078 Forming ref.-anchored db ... found 38506 genes for 22707 placements Completing the pan-genome annotation ... Adding non-anchor entries ... found 1163 genes and 1170 placements Checking missing direct ref. syn. OGs ... found 1 genes and 1 placements Adding indirect syn. OGs ... found 0 genes and 0 placements Adding syn. OGs without ref. anchor ... found 4560 genes and 4559 placements Adding missing genes by synOG identity ... found 379 genes and 1 placements Annotating and formatting pan-genome Adding non-anchor entries ... found 4817 genes and 2223 placements Adding non-syn. orthologs ... Error in [.data.table(x, , :=(n, 1:.N), by = c("ofID1", "genome")) : Supplied 2 items to be assigned to group 1 of size 0 in column 'n'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code. Calls: pangenome ... pull_nonSynOrthologs -> rbindlist -> lapply -> FUN -> [ -> [.data.table

I do not get this error if I set outgroup=NULL

jtlovell commented 2 years ago

👋 Chad! Did you run synteny() with tomato as an outgroup, then re-run init_genespace so that your gpar object doesn't have an outgroup, then run pangenome()?

niederhuth commented 2 years ago

I reran entire pipeline (except the initial orthofinder step) with or without tomato as outgroup.

jtlovell commented 2 years ago

My guess is you didn't set synteny(..., overwrite = TRUE), so the second run in the same directory saw that a run was there and didn't go forward (so the outgroup was ignored). In the next release, GENESPACE will check to make sure that all the genomes that you want to analyze have synteny results before not overwriting.

niederhuth commented 2 years ago

I deleted the results directory between runs. Also I ran it with the outgroup first.

jtlovell commented 2 years ago

ok. this looks like a real bug. I'll address it and get back to you.

jtlovell commented 2 years ago

Yup, its a bug. If you use an outgroup, v0.9.3 would not respect it. If using v0.9.3 and an outgroup, you'll also need to manually specify the genomes to use. For example pg <- pangenome(gpar, genomeIDs = c("L1","S1")). This should be fixed in v.0.9.4 and later, but I will take care to ensure no issues persist in future releases. Thanks for your work finding this, Chad.

jtlovell / GENESPACE

Error when using outgroup during pangenome step #27