hezarai / hezar

The all-in-one AI library for Persian, supporting a wide variety of tasks and modalities!
https://hezarai.github.io/hezar/
Apache License 2.0
832 stars 44 forks source link

Train #174

Closed edvinbehdadi closed 1 month ago

edvinbehdadi commented 1 month ago

can we train with hezarai/crnn-fa-printed-96-long on handwriting data set??

arxyzan commented 1 month ago

Hi @edvinbehdadi , yes of course you can. It's recommended to have at least 2000 samples to train. For reference, you can see this training example. For custom datasets you must also do some other tweaks like here and here

edvinbehdadi commented 1 month ago

thank you bro

On Mon, Jul 22, 2024 at 9:50 AM Aryan Shekarlaban @.***> wrote:

Hi @edvinbehdadi https://github.com/edvinbehdadi , yes of course you can. It's recommended to have at least 2000 samples to train. For reference, you can see this training example https://github.com/hezarai/hezar/blob/main/examples/train/train_ocr.py. For custom datasets you must also do some other tweaks like here https://github.com/hezarai/hezar/issues/134#issuecomment-1868881700 and here https://hezarai.github.io/hezar/tutorial/training/license_plate_recognition.html

— Reply to this email directly, view it on GitHub https://github.com/hezarai/hezar/issues/174#issuecomment-2242174679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2H7RMVDVTN6D5DPPENQOLZNSQCTAVCNFSM6AAAAABLG4JGK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGE3TINRXHE . You are receiving this because you were mentioned.Message ID: @.***>

edvinbehdadi commented 1 month ago

Can you give me an approximate estimate of how many of the sample data are handwritten? (hezarai/parsynth-ocr-200k)

On Mon, Jul 22, 2024 at 11:27 AM Edvin Behdadi @.***> wrote:

thank you bro

On Mon, Jul 22, 2024 at 9:50 AM Aryan Shekarlaban < @.***> wrote:

Hi @edvinbehdadi https://github.com/edvinbehdadi , yes of course you can. It's recommended to have at least 2000 samples to train. For reference, you can see this training example https://github.com/hezarai/hezar/blob/main/examples/train/train_ocr.py. For custom datasets you must also do some other tweaks like here https://github.com/hezarai/hezar/issues/134#issuecomment-1868881700 and here https://hezarai.github.io/hezar/tutorial/training/license_plate_recognition.html

— Reply to this email directly, view it on GitHub https://github.com/hezarai/hezar/issues/174#issuecomment-2242174679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2H7RMVDVTN6D5DPPENQOLZNSQCTAVCNFSM6AAAAABLG4JGK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGE3TINRXHE . You are receiving this because you were mentioned.Message ID: @.***>

arxyzan commented 1 month ago

@edvinbehdadi Unfortunately, the whole dataset is digital and synthetic. You must create your own handwritten dataset and feed it to the trainer.

edvinbehdadi commented 1 month ago

👍♥️

On Mon, Jul 22, 2024 at 11:54 AM Aryan Shekarlaban @.***> wrote:

@edvinbehdadi https://github.com/edvinbehdadi Unfortunately, the whole dataset is digital and synthetic. You must create your own handwritten dataset and feed it to the trainer.

— Reply to this email directly, view it on GitHub https://github.com/hezarai/hezar/issues/174#issuecomment-2242375921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2H7RLTMRL3VMOC3A6QMZTZNS6VDAVCNFSM6AAAAABLG4JGK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGM3TKOJSGE . You are receiving this because you were mentioned.Message ID: @.***>