synthetic dataset - Githubissues

pj771 commented 4 years ago

Hi, I have dataset related questions, specifically regarding generation of synthetic dataset, What were the parameters selected to create the synthetic dataset from corresponding repository (i.e., kikones34 /handwritten-document-synthesizer), I am using following command,

./synthesize -num-pages=270 -words -distort-bboxes

which creates about 60k synthetic handwritten word images (in paper it is mentioned 5.6 million word images are created)

Also was the default corpus provided in kikones34 /handwritten-document-synthesizer used to create synthetic word images or was it changed from its default setting?

leitro commented 4 years ago

Hi! You are right, more synthetic words can be obtained by changing ''-num-pages'' to a bigger number.

About the corpus, I downloaded ebooks from here.

Cheers:-)

pj771 commented 4 years ago

Thanks. Just to confirm, how many synthetic word images were created to generate synthetic word dataset? was it 60k or 5.6 million?

Edit: Also, did you download all books? or any specific list?

omni-us / research-WriterAdaptation-HTR

synthetic dataset #2