jtlovell / GENESPACE

Other
192 stars 27 forks source link

Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) #96

Closed theo-allnutt-bioinformatics closed 1 year ago

theo-allnutt-bioinformatics commented 1 year ago

If I run genespace in interactive R session I get:

  1. Combining and annotating the blast files with orthogroup info ...

    Chunk 1 / 1 (12:05:02) ...

    Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { : Item 10 of input is not a data.frame, data.table or list In addition: Warning message: In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { : scheduled core 10 encountered error in user code, all values of the job will be affected

I am not sure if this is because hte run is being killed by my system (to many resources for a login node). But on a PBS job I get this error:

GENESPACE v1.2.3: synteny and orthology constrained comparative genomics Error in FUN(X[[i]], ...) : found multiple fasta files with faString = fa$|fasta$|faa$|fa.gz$|fasta.gz|faa.gz: /g/data/nm31/d/r17_nepenthes/r17.2_synteny/gs2/rawGenomes//nm Calls: parse_annotations -> rbindlist -> lapply -> FUN In addition: Warning message: In sprintf("found multiple fasta files with faString = %s:\n\t%s", : one argument not used by format 'found multiple fasta files with faString = %s: %s' Execution halted

theo-allnutt-bioinformatics commented 1 year ago

My script:

!/usr/bin/env Rscript

library(GENESPACE)

wd <-file.path("/g/data/nm31/d/r17_nepenthes/r17.2_synteny/gs2/")

genomeRepo<-"/g/data/nm31/d/r17_nepenthes/r17.2_synteny/gs2/rawGenomes/"

path2mcscanx <- "/g/data/nm31/bin/MCScanX/"

genomes2run <- c("nm","ft","bv","so")

parsedPaths <- parse_annotations( rawGenomeRepo = genomeRepo, genomeDirs = genomes2run, genomeIDs = genomes2run, presets = "ncbi", genespaceWd = wd)

gpar <- init_genespace( wd = wd, nCores = 48, blkSize=3, onlyOgAnchors=F, nSecondHits=10, path2mcscanx = path2mcscanx)

out <- run_genespace(gpar, overwrite = T)

pdf('gs2.pdf',width=20,height=10)

plot_riparian( gsParam = out, refGenome = "nm", useRegions = FALSE)

dev.off()

theo-allnutt-bioinformatics commented 1 year ago

I do not understand this error:

found multiple fasta files with faString = fa$|fasta$|faa$|fa.gz$|fasta.gz|faa.gz: /g/data/nm31/d/r17_nepenthes/r17.2_synteny/gs2/rawGenomes//nm In addition: Warning message: In sprintf("found multiple fasta files with faString = %s:\n\t%s", : one argument not used by format 'found multiple fasta files with faString = %s: %s' There is only one fasta file for the genome and cds: /g/data/nm31/d/r17_nepenthes/r17.2_synteny/gs2/rawGenomes/nm/nm/nepenthes_nuclear.fasta /g/data/nm31/d/r17_nepenthes/r17.2_synteny/gs2/rawGenomes/nm/nm/annotation/nm_cds.faa

theo-allnutt-bioinformatics commented 1 year ago

If I omit the parse_annotations, I get:

GENESPACE v1.2.3: synteny and orthology constrained comparative genomics Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { : Item 10 of input is not a data.frame, data.table or list Calls: run_genespace ... annotate_blast -> rbindlist -> lapply -> FUN -> rbindlist In addition: Warning message: In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { : scheduled core 10 encountered error in user code, all values of the job will be affected Execution halted

theo-allnutt-bioinformatics commented 1 year ago

I started again and deleted all the result files so now just have:

genomeRepo/ |-- bv | |-- GCF_000511025.2_RefBeet-1.2.2_genomic.gff | -- bv_cds.faa |-- ds | |-- Drosera_spatulata.gff |-- Ds_proteins.faa |-- ft | |-- FtChromosomeV2.IGDBv2.Coreset.gff | -- ft_cds.faa |-- nm | |-- nm.gff |-- nm_cds.faa -- so |-- GCF_002007265.1_ASM200726v1_genomic.gff -- so_cds.faa

and get this error:

parsedPaths <- parse_annotations( rawGenomeRepo = genomeRepo, genomeDirs = genomes2run, genomeIDs = genomes2run, presets = "ncbi", genespaceWd = wd)

Error: subscript contains invalid names

Is this a format error? I can't find any information in the tutorial on 'presets' so I am using the example 'ncbi'. n.b. these files worked ok with v0.09xx

Thanks.

theo-allnutt-bioinformatics commented 1 year ago

my cds looks like:

jg23760 MELPSRHPWTQTSGRLHLKLGALYSKFVRQLLEFDSLSFQAPQLLLQLSQLRPEDSSLLTISLGLSDSST QLARGISGSTPARPVDAGEIKFSAALLLAAKGILDAASMEVLVEEARVAEKMDSIDGVIVEGWNHPGLPQ LEPKGVLRAAPCPEPVGLSHQALANYLNQASLEFDAKLQEVSGAYLRKYQESNHNIIYECTKLYDILFLL SRLTILS

and gff:

ptg000003l_1 AUGUSTUS gene 17886204 17896955 . - . ID=jg24487; ptg000003l_1 AUGUSTUS mRNA 17886204 17896955 . - . ID=jg24487.t1;Parent=jg24487; ptg000003l_1 AUGUSTUS stop_codon 17886204 17886206 . - 0 ID=jg24487.t1.stop1;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS CDS 17886204 17886483 1 - 1 ID=jg24487.t1.CDS1;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS exon 17886204 17886483 . - . ID=jg24487.t1.exon1;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS intron 17886484 17887058 . - . ID=jg24487.t1.intron1;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS CDS 17887059 17887287 1 - 2 ID=jg24487.t1.CDS2;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS exon 17887059 17887287 . - . ID=jg24487.t1.exon2;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS intron 17887288 17892578 . - . ID=jg24487.t1.intron2;Parent=jg24487.t1; ptg000003l_1 AUGUSTUS CDS 17892579 17892780 1 - 0 ID=jg24487.t1.C

etc.

jtlovell commented 1 year ago

I'm not sure - parse_annotations is a utility function to get raw annotations into genespace-format. But, that format is very general, and should be easy for you to make without it, if you can't figure out parse_annotations paramters. I'd just follow along with the tutorial, but make your input files manually.

theo-allnutt-bioinformatics commented 1 year ago

Thanks, it was good advice to not bother with parse_annotations. I followed the formats at the end of the tutorial and it works fine.