Closed arxyzan closed 7 months ago
I generated a 4 million samples dataset for training CRNN. The dataset is so huge in size and unfortunately I couldn't manage to upload it to the Hub yet.
I generated another 4 million samples dataset to train our new CRNN model at https://huggingface.co/hezarai/crnn-fa-printed-96-long but the size of the zipped dataset is 12 GB. I have no clue how we can upload such dataset to the Hub given that our network speed is 2MB/s max! I'm labeling this issue as "community help required".
I pushed a 200k version of the dataset at https://huggingface.co/datasets/hezarai/parsynth-ocr-200k . The release of the full 4M version is not feasible right now so I'm closing this.
The codes for generating datasets are on this repo https://github.com/hezarai/trdg-persian