raphael-baena / DTLR

Handwritten Text Recognition and Character Detection
Apache License 2.0
99 stars 9 forks source link

mismatch dimension between HWDB checkpoint and charset #3

Closed zjl123001 closed 1 month ago

zjl123001 commented 1 month ago

i am confused that the dimension in finetuned checkpoint HWDB is 2704 but the charset in /data/HWDB_v1/charset.pkl is 7356 could you pls help me about this problem? thanks

raphael-baena commented 1 month ago

Hello,

Thanks for bringing this up! Here are a few clarifications regarding the pretrained and finetuned checkpoints:

I noticed that I initially forgot to include the charset from the HWDB dataset (CASIA v2). I've now added the missing charset in the corresponding folder. HWDB.py (CASIA v2) were already designed to look for this charset but i was missing:

self.data = pickle.load(open(os.path.join(datasets_path, 'HWDB', 'data.pkl'), "rb"))
self.charset = self.data['charset']

Please let me know if you need further assistance.