Closed rowedenny closed 7 months ago
Thank you for your question.
Yes, you can unify the representation space for rationalization and prediction. While we haven't tested this idea yet, it could be a promising direction if you observe performance improvements and can find theoretical motivation for it!
Alternative training is more efficient because some parts of the model parameters are disabled for updating. The training process should be flexible, so you can also try the joint training approach. In our experience, we did not observe a significant difference in model prediction accuracy between these two methods.
Thanks for release the code for this amazing work. I have two questions about the model design and its optimization:
Thank you.