For MLM: Create multiple example objects, then collate them to batchedexamples before finally using the mask tokens to make the batchedexamples into a maskedbatchedexmaples object.
For NER: Very easy, simply pass a list of sequences to the ner dataset object, and change forward passing slightly, copying the ner evaluation approach to keeping track of documents
Should be straight forward for both API wrappers:
For MLM: Create multiple example objects, then collate them to batchedexamples before finally using the mask tokens to make the batchedexamples into a maskedbatchedexmaples object.
For NER: Very easy, simply pass a list of sequences to the ner dataset object, and change forward passing slightly, copying the ner evaluation approach to keeping track of documents