jtlovell / GENESPACE

Other
180 stars 24 forks source link

Error while parsing #126

Closed gubrins closed 11 months ago

gubrins commented 11 months ago

Hi all, I am really excited about using Genespace, thanks for developing this software!

I think I am having some issues regarding the parsing of my data. I tried to do it manually but it did not work, so now I am trying to use the parse_anotations function, this is my code:

genomeRepo = getwd()
wd = getwd()
path2mcscanx = "/mnt/DiscoA/cerastes/new_order/analyses/synteny/prueba/MCScanX/"
genomes2run <- c("cerastes_gasperettii", "naja_naja", "crotalus_tigris")

parsedPaths <- parse_annotations(
  rawGenomeRepo = genomeRepo,
  genomeDirs = genomes2run,
  genomeIDs = genomes2run,
  presets = "none",
  genespaceWd = wd)

And that's the output I get:

Error in setDT(ans, key = key) : 
  All elements in argument 'x' to 'setDT' must be of same length, but the profile of input lengths (length:frequency) is: [9:1, 0:8]
The first entry with fewer than 9 entries is 2
In addition: Warning message:
In setDT(ans, key = key) :
  Some columns are a multi-column type (such as a matrix column): [1]. setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column.

I am not sure what is going on, I have a folder for each specific genome with each annotation in gff format and its predicted proteins, any help would be more than welcomed!

Thanks in advance!!

jtlovell commented 11 months ago

this is not an informative error ... I'm not sure whats up though. Can you share your genomeRepo directory? Or just post its structure and the heads of the fasta and gff3 files.

gubrins commented 11 months ago

Hi @jtlovell, thanks for your reply!! My directory has the three reference genomes together with a folder for each species that contains both the annotation and the proteins. Here I add a screenshot: Screenshot 2023-10-11 at 13 16 50 I am not sure if this should be the proper genomeRepo directory, sorry if it is the case.

Thanks, Gabriel

jtlovell commented 11 months ago

Ok - yeah, this is the issue. See instructions for this here.