wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
559 stars 193 forks source link

Byte Pair Encoding #52

Closed kbrajwani closed 3 years ago

kbrajwani commented 3 years ago

Hey, I saw you are using keys.txt file for encoding data. if i am wrong please correct me.

  1. If you are using keys.txt then how you are making it and if any word is not in training data then it can be handled or not.
  2. I have to train on another language then how can i do?
cuongngm commented 3 years ago

i same the question. Can you explained purpose and effect of that file?

cuongngm commented 3 years ago

it look like you need build a vocab in your language in new file keys.txt and retrain the model. In my case, the vocab is: aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉỈĩĨíÍịỊjJkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ0123456789!"#$%&''()*+,-./:;<=>?@[]^_`{|}~

kbrajwani commented 3 years ago

I got your point that we have to build vocab to include each character. I was confused about words but I forgot that pick is working on the character. Thanks