Try to tune the flow control on the Rx queue. Seemed to have a minor throughput improvement from sending pause packet at about half full.
Make the flow control tuning more generic so it will work with buffer sizes other than the sizes in the RK3588.
Try to tune the AXI parameters. Almost nothing I did seemed to have a consistent effect. I think there was a small improvement by enabling address-aligned beats, though it might just be my imagination.