Open gchhablani opened 3 years ago
I am not sure, but would datasets.Sequence(datasets.Sequence(datasets.Sequence(datasets.Value("int")))
work?
Also, I found the information for loading only subsets of the data here.
Hi @lhoestq,
Request you to check this once.
Thanks, Gunjan
Hi @gchhablani since Array2D doesn't support images of different sizes, I would suggest to store in the dataset the paths to the image file instead of the image data. This has the advantage of not decompressing the data (images are often compressed using jpeg, png etc.). Users can still apply .map
to load the images if they want to. Though it would en up being Sequences features.
In the future we'll add support for ragged tensors for this case and update the relevant dataset with this feature.
Add Hateful Memes Dataset
I will be adding this dataset. It requires the user to sign an agreement on DrivenData. So, it will be used with a manual download.
The issue with this dataset is that the images are of different sizes. The image datasets added so far (CIFAR-10 and MNIST) have a uniform shape throughout. So something like
won't work for the images. How would I add image features then? I checked
datasets/features.py
but couldn't figure out the appropriate class for this. I'm assuming I would want to avoid re-sizing at all since we want the user to be able to access the original images.Also, in case I want to load only a subset of the data, since the actual data is around 8.8GB, how would that be possible?
Thanks, Gunjan