ibm-aur-nlp / PubLayNet

Other
900 stars 165 forks source link

Any non-English document images in the dataset? #23

Open jocelynguo opened 4 years ago

jocelynguo commented 4 years ago

This is a nice dataset for research on NLP and CV. Thank you for making it publicly available. Wondering any foreign language document image is included in the PubLayNet dataset?

zhxgj commented 4 years ago

@jocelynguo Thanks. It is a good questions. I do not have statistics, but I think nearly all the documents are in English. There may be a few documents with some foreign characters of medicine names.