Conv kernels will handle non-tile multiple input shard height by leaving garbage rows in the tilized CB and output CB. Output shard can be allocated with padded/tile aligned size if conv output is tiled layout. If conv output is RM, output tensor's shard shape will be be allocated with padded/tile multiple size but the shard shape would be set to actual non-tile multiple size (ignoring the garbage rows in the shard).
Conv kernels will handle non-tile multiple input shard height by leaving garbage rows in the tilized CB and output CB. Output shard can be allocated with padded/tile aligned size if conv output is tiled layout. If conv output is RM, output tensor's shard shape will be be allocated with padded/tile multiple size but the shard shape would be set to actual non-tile multiple size (ignoring the garbage rows in the shard).
Two separate tasks: