Closed abhinavdayal closed 3 years ago
There's another issue that gives the command for stage 2. I wonder if this gives a clue: https://github.com/grammarly/gector/issues/42
For stage 3, see https://github.com/grammarly/gector/issues/11
You can convert the m2 format to the parallel one, then convert it to ours. In general, M2 doesn't suit here as there may be multiple annotators.
It would be nice to know in more detail about the fine tuning stages. For example, I would like to fine tune this model on a specific domain. Is fine tuning same as training, i.e. we call train.py methods? The training require data to be in format of input and output sentences from where a pre processor converts it into delimited tokens and tags, which then is tokenized, instance created and fed to the embedder. The codebase does not seem to have a way to take in m2 files and bring them into trainable format.