harvardinformatics / degenotate

MIT License
40 stars 4 forks source link

**Error SEQ2: Some exons have differing strands #30

Closed lisagrigoreva closed 1 year ago

lisagrigoreva commented 1 year ago

Hi,

I'm trying to process Arabidopsis data using your tool. However, there is an error that arises I'm providing gff and full genome fasta file

-----------------------------------------------
**Error SEQ2: Some exons have differing strands
-----------------------------------------------

Should I remove these exons?

lisagrigoreva commented 1 year ago

Solved. You need to subset only genomic sequences from gtf and fasta (probably this is a problem with annotation)

gwct commented 1 year ago

Great! Glad it seems to be working now. We did code that all exons within a transcript should be on the same strand in the annotation file, and I think this is expected for most annotation pipelines. But GFT/F files are notoriously un-standardized, so we're always on the look out for these discrepancies.

milesroberts-123 commented 1 year ago

I've ran into this same problem with annotations for a few different plant species. Would it be possible to turn this error into a warning where degenotate automatically drops the problematic transcript(s) before continuing with the analysis?

Of course, just manually removing the problematic transcripts from the GFF file is a pretty easy fix too

gwct commented 1 year ago

Will work on converting these to warnings and update when I push the code.

gwct commented 1 year ago

I've changed theses from errors to warnings and pushed the code https://github.com/harvardinformatics/degenotate/pull/34, so if you've downloaded from github you can update via a git pull or by downloading v1.1.3 . It should be up on bioconda soon as well.

milesroberts-123 commented 1 year ago

Very cool! Thank you so much!

gwct commented 1 year ago

Updated version now up on bioconda.