microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.29k stars 2.56k forks source link

how to get charseg.npy from ocr.txt and image.png #1532

Open simajiucai opened 7 months ago

simajiucai commented 7 months ago

If I want to train with my own dataset, an inevitable problem is how to obtain charseg.npy by processing the image.png and the ocr.txt