peleiden / daluke

A Danish-speaking language model with entity-aware self-attention
MIT License
9 stars 0 forks source link

Batched API (multiexamples) #111

Closed sorenmulli closed 3 years ago

sorenmulli commented 3 years ago

Should be straight forward for both API wrappers:

For MLM: Create multiple example objects, then collate them to batchedexamples before finally using the mask tokens to make the batchedexamples into a maskedbatchedexmaples object.

For NER: Very easy, simply pass a list of sequences to the ner dataset object, and change forward passing slightly, copying the ner evaluation approach to keeping track of documents