The current melody model uses time-based encoder/decoders; each RNN step is one 16th note by default. We should also try note-based encoder/decoders, where each RNN step is an event of the form "play note with pitch P for duration D".
Perhaps it could be resolved using sigmoid output with every pitch/note encoded as multi-class labels. That way we could keep the 16th note resolution which helps inform the RNN of rhythmic patterns
The current melody model uses time-based encoder/decoders; each RNN step is one 16th note by default. We should also try note-based encoder/decoders, where each RNN step is an event of the form "play note with pitch P for duration D".