microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.21k stars 2.55k forks source link

LayoutXLM pretraining data #426

Closed rpowalski closed 3 years ago

rpowalski commented 3 years ago

Hello, Thanks for the great work with the LayoutXLM.

I wonder if you could share some of your work with the dataset used for pretraining. I know this is not possible to share dataset itself, due to Common Crawl policy, but can you share code which was used for obtaining the data?

wolfshow commented 3 years ago

@rpowalski Thanks for asking! For now, we do not have plans to share the dataset and the code to obtain the data.