oh-my-ocr / text_renderer

https://oh-my-ocr.github.io/text_renderer/README.html
MIT License
802 stars 161 forks source link

lmdb2img Compatible with PaddleOCR #25

Closed chccc1994 closed 3 years ago

chccc1994 commented 3 years ago

step:

  1. generated lmdb filedata.mdb lock.mdb
    python main.py  --config example_data\example.py --dataset lmdb  --num_processes 2 --log_period 50

but How to convert Compatible with PaddleOCR? is this right?

python tools/lmdb2img.py  inputfiles1  outputfiles2
Sanster commented 3 years ago

paddleocr supports lmdb dataset https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/recognition_en.md#data-preparation

chccc1994 commented 3 years ago

Why generate 5000 images will generate 1t memory? Can LMDB format be used in paddleocr?


At 2021-07-02 08:56:18, "Qing" @.***> wrote:

paddleocr supports lmdb dataset https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/recognition_en.md#data-preparation

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Sanster commented 3 years ago

It's a feature of LMDB on windows, see: https://stackoverflow.com/questions/33508305/lmdb-maximum-size-of-the-database-for-windows

You can change to a suitable here: https://github.com/oh-my-ocr/text_renderer/blob/master/text_renderer/dataset.py#L137