Closed marcpaga closed 2 years ago
Hi @marcpaga ,
Thanks for opening this issue. There are several things to consider here:
CTC-loss takes all possible alignments and then sums the alignment scores based on the predictions. In calculating distance between two genome sequences that are very similar, this is not ideal as suboptimal alignments can contribute heavily to the calculated loss. You can read in our manuscript that we allow special handling for insertions and deletions (gap penalty) and maximize over the best alignment score. This way the loss is smoother as we control how much the suboptimal alignments can contribute to the loss using a parameter.
CTC-loss does and exhaustive search over all possible alignments and is a bit slower. The alignment-loss function implemented for deepconsensus provides an efficient implementation for faster training on GPUs and TPUs.
Finally, we have not exhaustively looked at all the loss functions that are available. So, future experiments with other alignment-based loss functions including CTC may give us observations to help answer your question more accurately.
Hi @kishwarshafin,
Thanks for the quick and clear response! I think that a performance comparison with CTC and other losses would be very interesting for scenarios where both could be used. Not only in terms of speed or compute requirements, but also accuracy. Furthermore, your alignment loss can be used in models where CTC is not possible to be used, like seq2seq models, so thanks for your contribution!
Thanks for the very interesting work.
I was wondering about the alignment loss used to train the model. It is clear that indels can shift the whole predicted sequence and then a loss like cross-entropy explodes by small mistakes. I thought a CTC loss would work in this scenario, but you developed a new alignment loss for this task. I was wondering if you could elaborate on why this alignment loss is needed or why CTC is not viable here.