hassan-mahmood / TIES_DataGeneration

Dataset Generation Code for: S.R. Qasim, H. Mahmood, and F. Shafait, Rethinking Table Parsing using Graph Neural Networks (2019)
MIT License
118 stars 39 forks source link

Custom data #20

Open mxnthng opened 2 years ago

mxnthng commented 2 years ago

can I make a custom distribution and image, ocr, tb file my own? I mean I'd like to generate table data with Japanese, not English but I have no idea how to do this?

hassan-mahmood commented 2 years ago

There can be various ways to compute a distribution over textual data. One simple way (also used in this repo) is to use the frequency of words (alphabetic, numeric, symbolic, and alphanumeric words) to build a distribution, sample a category based on this distribution and uniformly randomly select a word from that category.