You use MLE in the first stage of your estimation strategy for your logistic regression of the conditional probabilities. Is there any advantage of using MOM in the first stage as well as the second stage? It might be more consistent with the second stage approach. And if you are worried about distributional assumptions in the 2nd stage, why not also worry about that in the 1st stage.
Further, how confident are you that the 1st stage MLE estimate is not getting stuck at local maxima? MLE likelihood function surfaces are notoriously lumpy.
You use MLE in the first stage of your estimation strategy for your logistic regression of the conditional probabilities. Is there any advantage of using MOM in the first stage as well as the second stage? It might be more consistent with the second stage approach. And if you are worried about distributional assumptions in the 2nd stage, why not also worry about that in the 1st stage.
Further, how confident are you that the 1st stage MLE estimate is not getting stuck at local maxima? MLE likelihood function surfaces are notoriously lumpy.