google / myanmar-tools

Detect and convert the Zawgyi-One font encoding in C++, Java, JavaScript, PHP, and Ruby
Other
242 stars 87 forks source link

Want to build a train to detect for Shan Language. #33

Open saitawngpha opened 5 years ago

saitawngpha commented 5 years ago

Dear, I am interesting in Myanmar Tools that can detect Myanmar fonts with ML. I would like to build for Shan Language too.

Can you mention me where do I have to start? Best, STP

sffc commented 5 years ago

Dear STP,

Yes, this would be a good feature to add. I would suggest the following two classifiers:

  1. Zawgyi versus Unicode (any language) -- what already exists.
  2. Unicode Burmese versus Unicode Shan

To add the Unicode Burmese versus Unicode Shan classifier to Myanmar Tools, you can:

  1. Download the training data as explained in the README
  2. Add methods to BurmeseData.java to read the my.txt (Burmese Unicode) and shn.txt (Shan Unicode) separately
  3. Remove the Category enum from ZawgyiUnicodeMarkovModelBuilder.java and replace it with a boolean at the call sites of trainOnString
  4. Make a copy of GenerateZawgyiUnicodeModelDAT.java named something like GenerateUnicodeBurmeseShanModelDAT.java, and have it load and train on the new data sets
  5. Add a target to Makefile that invokes your new Java function and saves the output to a new dat file named something like unicodeBurmeseShanModel.dat, and add logic to the copy-resources target to copy that file to the client implementations
  6. Add API to read from the new file in the various client implementations. You can start with just one client implementation, like Java. For example, copy ZawgyiDetector.java into a new file named something like ShanDetector.java, pointing it to your new unicodeBurmeseShanModel.dat file
  7. Add tests

Hope that helps!

saitawngpha commented 5 years ago

Dear Shane F,

Thanks for your help. I will try it and when I have got some problem, I will ask your help.

saitawngpha commented 5 years ago

Dear Shane F,

Thanks for your help. I will try it and when I have got some problem, I will ask your help.