Open stg1205 opened 11 months ago
Thank you for your interest! The batch size is set to 1 due to complexities with the dual-decoder architecture that make larger batch sizes difficult to implement.
Is this problem related to the input dimension of gpt2? Can we just flatten the first two dimensions, batch_size, seq_len, of the input to the char decoder?
Hi
Nice work! I would say it's the only model that can generate fluent melody from scratch so far. Got a question:
During training, why do you set the batch size to 1? I saw the squeeze(0) function in the forward process, is that due to some particular cases?