iterative / google-kaggle-competition

0 stars 0 forks source link

Image naming in `manifest.json` and `labels.json` #13

Open ankxyz opened 1 year ago

ankxyz commented 1 year ago

In the manifest.json and labels.json (https://github.com/iterative/google-kaggle-competition/tree/custom_pytorch_dataloader/data/voxel51/test) some image names starts with the same base name, e.g.:

['image0000-3', 'image0000-4', 'image0000-8']

They all starts with image0000. Why so naming is used?

I see (from paths in manifest) that files have such names. But how do you get those name? Because original dataset kaggle_130k items have another names:

├── apparel
│   ├── image0000.png
│   ├── image0001.png
│   ├── image0002.png
..................
├── artwork
│   ├── image0000.png
│   ├── image0001.png
│   ├── image0002.png
..................
Meleagos commented 1 year ago

Hi @ankxyz . I am sorry, I did not get the notification about the issue and missed it.

The renaming is done automatically by Voxel51 library that we use. Specifically , we use FiftyOneImageClassificationDataset export format. This format moves all the images into one folder (data). If some images shall have the same name, then Voxel51 automatically renames them.