Open Thomas-MMJ opened 1 year ago
Hey @Thomas-MMJ 👋 ,
Thanks for the request do you want to add it maybe ? If so im happy to guide you If there is any help needed :)
Hi @felixdittrich92 , has anybody worked on this? I'd love to hop into the project and contribute to this issue. :)
Hey @dvando 👋,
No it's still open. Sure feel free to work on it, if you have any questions or need some help contact me :)
Hi @felixdittrich92 , my apology it took me a while to actually work on it, I've been dealing with some issues from work.
I've got some questions about the URLs for download, COCO-text has 2 separate URLs, the first one is for the images, and the second is for the labels, but the VisionDataset
only accepts 1 URL which I believe lead to a compressed images and it's labels.
I also checked the other datasets (funsd, cord, synttext, etc), and all of them initialized the VisionDataset
using 1 URL only, I was thinking about merging the files myself, but then I was wondering if that's the right thing to do. (Changing the base class should not be an option I believe)
Sorry, and thanks in advance. :)
Hi @dvando :smile: No stress ^^
Option 1: You could take a look at https://github.com/mindee/doctr/blob/main/doctr/datasets/imgur5k.py (here the user needs to provide the paths to the data and we provide only the loader) Option 2: What's the dataset size in MB / GB ? What's the license ? If both isn't troublesome we could combine the dataset and upload it :)
So with option 1, the user should download the images and the labels by themself? That sounds okay. The dataset has ~13 GB in size and has CC by 4.0 license.
Both sound fine to me, which one do you prefer @felixdittrich92 ? :)
So with option 1, the user should download the images and the labels by themself? That sounds okay. The dataset has ~13 GB in size and has CC by 4.0 license.
Both sound fine to me, which one do you prefer @felixdittrich92 ? :)
Option 1 :+1:
As reference PR: https://github.com/mindee/doctr/pull/1359 :)
🚀 The feature
You might consider adding COCO-text as one of the supported datasets,
https://vision.cornell.edu/se3/coco-text-2/#download
Motivation, pitch
It is another high quality dataset, text on objects at various angles (sides of vehicles, signs, etc.)
Alternatives
No response
Additional context
No response