jtlovell / GENESPACE

Other
189 stars 27 forks source link

GENESPACE to infer genome annotation de novo #73

Closed rodrisenovilla closed 1 year ago

rodrisenovilla commented 1 year ago

Dear John T. Lovell,

First of all, thank you for your really nice tool, it seems really usefull!

I am working on a de novo genome annotation (lizard Paroedura picta), so my files are not as shiny as your default raw genomes. My peptides are different transcripts from a gene, and the gff not only includes genes, but many transcripts. Although this hasn't prevent me from running the orthofinder part of your pipeline, it fails when trying to run the synteny part:

Synteny Parameters have not been set! Setting to defaults Indexing location of orthofinder results ... Done! Parsing the gff files ... Reading the gffs and adding orthofinder IDs ... Done! Found 64384 global OGs for 64532 genes QC-ing genome to ensure chromosomes/scaffolds are big enough... Genome: n. chrs PASS/FAIL, n. genes PASS/FAIL, n. OGs PASS/FAIL gecko: 196/2112, 59338/3407, 59338/3407 human: 2/0, 1787/0, 1643/0 All look good! Defining collinear orthogroup arrays ... Found the following counts of arrays / genome: human: 194 genes in 60 collinear arrays Pulling synteny for 2 unique pairwise combinations of genomes Running 1 chunks of up to 18 combinations each: Chunk 1 / 1 (16:57:05) ... Warning in mclapply(1:nrow(splSynp[[i]]), mc.cores = nCores, function(j) { : all scheduled cores encountered errors in user code Done! Error in x$gen1 : $ operator is invalid for atomic vectors

I assume it could be due to the transcript identificator in my species vs the gene identificator in the human (I am using your raw_genome).

My objective would be to assign a common name based on orthofinder and synteny pan-genome to my unpredicted genes. However, I guess this goal may be unrealistic, so please tell me if I am mistaken.

Thank you very much in advance, Rodrigo Senovilla Ganzo

jtlovell commented 1 year ago

What GENESPACE version are you using? This doesn't look like v1.1.x

rodrisenovilla commented 1 year ago

GENESPACE v0.9.4, I had trouble upgrading to new version, but I will give it a try. Thanks!

rodrisenovilla commented 1 year ago

I managed to run the whole pipeline successfully!

I had issues with the example data, this error arose: Error in match_fasta2gff(path2fasta = fa, path2gff = gf, genespaceWd = genespaceWd, : some of the peptides have '.' or '-' in the sequence. Orthofinder can't handle this.

Once I filtered with Biostrings those sequences starting by "-...", it run smoothly. Thank you very much! Best, Rodrigo Senovilla Ganzo