I think we should only rely on CoalescedLoRAExpertContainer, therefore we can just remove the other one
I think we can port to CoalescedLoRAExpertContainer the MoE logic in LoRAExpertContainer: e.g. if we route topk=2, we can only extract the unique indices of the experts in the batch -- if we have a lot of experts ~ 1000 -- this will be much less than the total number? this logic is already in LoRAExpertContainer, but not in CoalescedLoRAExpertContainer