If you parallelize an unaligned load, you could potentially get the wrong result in the app since the Linebuffer assumes a constant enqueue stride and therefore has one wren signal. If the load is unaligned so that there are elements in the burst that are not getting enqueued into the lb, so it will refuse to enqueue any elements from that burst right now, secretly.
If you parallelize an unaligned load, you could potentially get the wrong result in the app since the Linebuffer assumes a constant enqueue stride and therefore has one wren signal. If the load is unaligned so that there are elements in the burst that are not getting enqueued into the lb, so it will refuse to enqueue any elements from that burst right now, secretly.