Open CEPHAS-01 opened 1 month ago
huh - funny you should mention this ... I just broke the dev version of DEEPSPACE with this same error. This happens when trying to generate an integer 2^31-1 ... for example position coordinate of a sequence > ~2.1Gb. I can't imagine how this would happen with GENESPACE though. Can you print the exact error and what step it came at?
Oh I see
The parse annotation step produced the error.
parsedPaths <- parse_annotations(
- rawGenomeRepo = "genespace/source",
- genomeDirs = c("human", "sheep"),
- genomeIDs = c("human", "sheep"),
- gffString = "gff",
- faString = "fasta",
- genespaceWd = "genespace/workspace") Error in paste(fa[1:100], collapse = "") : result would exceed 2^31-1 bytes
The genomes I am working with are quite large - human ~3GB and sheep ~2.8GB
perhaps some of the data type needs to be changed to increase the storage range.
I don't think thats it ... unless all the chromosomes got concatenated. Pine broke it and it has several chromosomes that are as large as the entire Hg38 human genome.
The chromosomes were not concatenated. I used the genome as downloaded from NCBI.
Can you post the urls to the files you downloaded from ncbi?
Sure Human genome and protein sequence from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/ [GCF_000001405.40_GRCh38.p14_genomic.fna.gz] [GCF_000001405.40_GRCh38.p14_protein.faa.gz]
Sheep genome and protein sequence from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/016/772/045/GCF_016772045.2_ARS-UI_Ramb_v3.0/ [GCF_016772045.2_ARS-UI_Ramb_v3.0_genomic.fna.gz] [GCF_016772045.2_ARS-UI_Ramb_v3.0_protein.faa.gz]
did you try to pass parse_annotations
these files?
You want the
translated_cds.faa.gz
and
genomic.gff.gz
See:
https://htmlpreview.github.io/?https://github.com/jtlovell/tutorials/blob/main/genespaceGuide.html
Yes, the parse_annotations stage produced the error. I was using the protein.faa.gz and not the translated_cds.faa.gz. Perhaps this is the reason. Stepping away from my desk shortly, I will test it with translated_cds.faa.gz and give you feedback. Thanks!
It should give a more informative error than that if you gave it the protein fa ... that one just doesn't parse right. I was wondering if you fed the genomic.fna.gz as a gff.
Hi and thanks for this beautiful comparative genomics tool.
I was trying out Genespace on our HPC system using the human and sheep assemblies from NCBI but ran into the following error when trying to parse_annotations "result would exceed 2^31-1 bytes".
I have checked and I am sure that the machine is a 64-bit architecture. Any suggestions on how to resolve this?
Temitayo