Process CASIA MPF Files

brucegarro commented 5 years ago

Hello @lucaskjaero, I have a project similar to yours where I've implemented some Chinese character recognition models using the CASIA data sets. For my project, I've similarly used the CASIA competition GNT files, but I believe it should be easier to build performant models on the HWDB1.X and OLHWDB1.X data sets because they are five times larger. Unfortunately, those data sets use a different file format MPF. Do you have any idea how to process these files using Python?

Datasets: http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html

My Project: https://github.com/brucegarro/chinese-character-recognition

lucaskjaero commented 5 years ago

Hi @brucegarro, I see there's a file specification here. You can read these files in python as strings of binary format using the struct library. In this project, I do this here, which hopefully is a decent example. Let me know if that helps -- I can see about implementing it here if it doesn't. Best, Lucas

brucegarro commented 5 years ago

Thank you for your response @lucaskjaero :pray:

lucaskjaero / PyCasia

Process CASIA MPF Files #2