Closed edvinbehdadi closed 1 month ago
Hi @edvinbehdadi , yes of course you can. It's recommended to have at least 2000 samples to train. For reference, you can see this training example. For custom datasets you must also do some other tweaks like here and here
thank you bro
On Mon, Jul 22, 2024 at 9:50 AM Aryan Shekarlaban @.***> wrote:
Hi @edvinbehdadi https://github.com/edvinbehdadi , yes of course you can. It's recommended to have at least 2000 samples to train. For reference, you can see this training example https://github.com/hezarai/hezar/blob/main/examples/train/train_ocr.py. For custom datasets you must also do some other tweaks like here https://github.com/hezarai/hezar/issues/134#issuecomment-1868881700 and here https://hezarai.github.io/hezar/tutorial/training/license_plate_recognition.html
— Reply to this email directly, view it on GitHub https://github.com/hezarai/hezar/issues/174#issuecomment-2242174679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2H7RMVDVTN6D5DPPENQOLZNSQCTAVCNFSM6AAAAABLG4JGK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGE3TINRXHE . You are receiving this because you were mentioned.Message ID: @.***>
Can you give me an approximate estimate of how many of the sample data are handwritten? (hezarai/parsynth-ocr-200k)
On Mon, Jul 22, 2024 at 11:27 AM Edvin Behdadi @.***> wrote:
thank you bro
On Mon, Jul 22, 2024 at 9:50 AM Aryan Shekarlaban < @.***> wrote:
Hi @edvinbehdadi https://github.com/edvinbehdadi , yes of course you can. It's recommended to have at least 2000 samples to train. For reference, you can see this training example https://github.com/hezarai/hezar/blob/main/examples/train/train_ocr.py. For custom datasets you must also do some other tweaks like here https://github.com/hezarai/hezar/issues/134#issuecomment-1868881700 and here https://hezarai.github.io/hezar/tutorial/training/license_plate_recognition.html
— Reply to this email directly, view it on GitHub https://github.com/hezarai/hezar/issues/174#issuecomment-2242174679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2H7RMVDVTN6D5DPPENQOLZNSQCTAVCNFSM6AAAAABLG4JGK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGE3TINRXHE . You are receiving this because you were mentioned.Message ID: @.***>
@edvinbehdadi Unfortunately, the whole dataset is digital and synthetic. You must create your own handwritten dataset and feed it to the trainer.
👍♥️
On Mon, Jul 22, 2024 at 11:54 AM Aryan Shekarlaban @.***> wrote:
@edvinbehdadi https://github.com/edvinbehdadi Unfortunately, the whole dataset is digital and synthetic. You must create your own handwritten dataset and feed it to the trainer.
— Reply to this email directly, view it on GitHub https://github.com/hezarai/hezar/issues/174#issuecomment-2242375921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2H7RLTMRL3VMOC3A6QMZTZNS6VDAVCNFSM6AAAAABLG4JGK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGM3TKOJSGE . You are receiving this because you were mentioned.Message ID: @.***>
can we train with hezarai/crnn-fa-printed-96-long on handwriting data set??