weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
140 stars 20 forks source link

Test (and probably bugfix) phase encoding where we don't start with phase 0 #53

Closed alisandra closed 3 years ago

alisandra commented 3 years ago

i.e.

(and sorry if I'm missing something, if it's already there and I missed it)

soi commented 3 years ago

ok, so the problem is that we should not start with phase 0 in first cds due to an impartial gene model and instead take the phase from the gff?

alisandra commented 3 years ago

yes, that, and for the second super chunking point we can either 1) avoid ever splitting within the CDS, or 2) handle a cds that started outside of the chunk, which might be more of a pain than it's worth.

soi commented 3 years ago

I'm going to try to fix the phase encoding then. Just not splitting inside a CDS seems reasonable to me

alisandra commented 3 years ago

Ok, great, thanks!

Was looking at the code for this last night (to ID what was going on), so if it gives any trouble, just ping me.

alisandra commented 3 years ago

The "during super chunking / splitting into write_by" bit is also now patched (avoided splitting in CDS entirely)