jtlovell / GENESPACE

Other
180 stars 24 forks source link

Error in step 3. Combining and annotating the blast files with orthogroup info ... #148

Open htorrado opened 3 months ago

htorrado commented 3 months ago

Hello jtlovell,

Thanks for all your work on this package!

I'm using GENESPACE v1.3.1 in a conda environment with R 4.1.2 and have it successfully for your test dataset so it should all be working as intended. When I use my own dataset, it all starts off well, e.g. all geneIDS are recognized (all exactly match), etc. but then in step 3, I receive the error message below and was hoping you may have any ideas or suggestions how I could resolve this and proceed.

Combining and annotating the blast files with orthogroup info ...# Chunk 1 / 1 (10:47:03) ...
Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  Item 1 of input is not a data.frame, data.table or list
In addition: Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  all scheduled cores encountered errors in user code

When I try to run it with just one core the error becomes:

Combining and annotating the blast files with orthogroup info ...
Error in vecseq(f, len, if (allow.cartesian  notjoin  !anyDuplicated(f__,  :
  Join results in 4687192 rows; more than 907698 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

The second error message seems related with the merge on step 2.3 on the annotate_blast command (merge with bed information). I tried to add "allow.cartesian" to that merge function but the full R session gets killed if it's just 1 core and if it's parallelized (using 10 cores) I receive the following error message:

Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  scheduled cores 1, 2 did not deliver results, all values of the jobs will be affected

This may lead to a merge file that is not compatible with some of your code below... ?

Thank you very much in advance!! Best, Héctor

jkfo002 commented 3 months ago

I meet same problem with @htorrado, dose it has any suggestion?

jtlovell commented 3 months ago

What version are y'all using? This is an error that popped up every now and then with duplicated gene IDs in <v1.1, but I had hoped I'd resolved it. The difference in errors between 1 and >1 cores is a function of how R handles parallelization. You are right, its an issue with the merge.

htorrado commented 3 months ago

I'm using GENESPACE v1.3.1 in R 4.1.2

Thanks for your help!

jkfo002 commented 3 months ago

I'm using GENESPACE v1.3.1 and R 4.2.0.

I have tried to filter fragment scaffolds and now it seemed run successfully, does it could be this reason?

jtlovell commented 3 months ago

@jkfo002 thanks for troubleshooting that. GENESPACE should deal with these without an issue. This is clearly a bug and needs to be fixed. Would you mind sharing your input /bed and /peptide directories from the run that caused the error? If so, please send me an email and we'll set up a private data transfer. email: jlovell[at]hudsonalpha[dot]org

jkfo002 commented 3 months ago

@jtlovell Sorry for late reply, I have sent the data to you. By the way, could GENESPACE construct gene synteny in local region for multiple genome?

jtlovell commented 3 months ago

np. I'll try to get to it next week. Re: local synteny ... do you mean something like this (Fig. 5.2 here).

jkfo002 commented 3 months ago

Actually....no, I think I need a zoom in on the small region of chromosome and see the synteny of gene cluster (maybe). @jtlovell

goshng commented 2 months ago

I have the same issue. Any luck?

goshng commented 2 months ago

For me, I have the error in the test example as well. Here is a tail of the output. Thank you!

        ...human  : 468 genes in 15 OGs hit > 8 unique places
        ##############
        Annotation summaries (after exclusions):
        ...chicken: 17433 genes in 15257 OGs || 2158 genes in 423 arrays
        ...human  : 20205 genes in 15979 OGs || 3460 genes in 853 arrays

############################
3. Combining and annotating the blast files with orthogroup info ...
        # Chunk 1 / 1 ...
Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  Item 1 of input is not a data.frame, data.table or list
In addition: Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  all scheduled cores encountered errors in user code
> ls()
[1] "genomeRepo"   "gpar"         "gsParam"      "parsedPaths"  "path2mcscanx"
[6] "rawFiles"     "wd"
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

...

other attached packages:
[1] GENESPACE_1.3.1
jtlovell commented 1 month ago

What orthofinder version are you using? I've seen this issue pop up from other users but have been unable to recreate it myself.