Let's use the Transformer because word level is easier to extract features - it is basically multiplying on a metrics! At least for the RS features, on POS we still need to work, but we can make first approximation on the distribution of different POS to a given word.
Kuan et al. repo: https://github.com/srewai/explicharr
They have two working baselines:
Let's use the Transformer because word level is easier to extract features - it is basically multiplying on a metrics! At least for the RS features, on POS we still need to work, but we can make first approximation on the distribution of different POS to a given word.