kha-white / manga-ocr

Optical character recognition for Japanese text, with the main focus being Japanese manga
Apache License 2.0
1.61k stars 83 forks source link

Synthetic data size used for training #78

Open wonjun-dev opened 1 month ago

wonjun-dev commented 1 month ago

Can I know the size of synthetic data used for training huggingface model? (https://huggingface.co/kha-white/manga-ocr-base)

Thank you for your awesome project.

kha-white commented 1 month ago

5 million synthetic and 100k real images

wonjun-dev commented 1 month ago

5 million synthetic and 100k real images

Thany you :)