microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.09k stars 2.44k forks source link

how to get charseg.npy from ocr.txt and image.png #1532

Open simajiucai opened 2 months ago

simajiucai commented 2 months ago

If I want to train with my own dataset, an inevitable problem is how to obtain charseg.npy by processing the image.png and the ocr.txt