object detection dataset

firsakov commented 2 years ago

Hello!

I'm trying to use FFCV with a dataset for object detection, each label has different numbers of bouding boxes. I use NDArrayFied of size (15, 5) (15 - maximum number of bboxes per image, 5 - class label + 4 coordinates of bbox) and get the following error

ValueError: could not broadcast input array from shape (40,) into shape (300,)

Could you help with that please?

andrewilyas commented 2 years ago

Hi @firsakov ! Can you provide a larger code sample that shows what your dataset is returning?

firsakov commented 2 years ago

@andrewilyas this is the error

File "/opt/conda/envs/ffcv/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/ffcv/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/ffcv/lib/python3.9/site-packages/ffcv/writer.py", line 112, in worker_job_indexed_dataset
    handle_sample(sample, dest_ix, field_names, metadata, allocator, fields)
  File "/opt/conda/envs/ffcv/lib/python3.9/site-packages/ffcv/writer.py", line 50, in handle_sample
    field.encode(destination, field_value, allocator.malloc)
  File "/opt/conda/envs/ffcv/lib/python3.9/site-packages/ffcv/fields/ndarray.py", line 93, in encode
    data_region[:] = field.reshape(-1).view('<u1')
ValueError: could not broadcast input array from shape (60,) into shape (400,)

and this is code

    writer = DatasetWriter(write_path, {
        'image': RGBImageField(),
        'label': NDArrayField(shape=(15, 5), dtype=np.dtype('float32')),
    })
    writer.from_indexed_dataset(dataset)

I tried to pass number of bboxes for a particaular sample as IntField but it didn't help or I just didnt figure out what to do next..

andrewilyas commented 2 years ago

Hi @firsakov , thanks for the info---can you also provide the definition of dataset in your code above?

gaceladri commented 2 years ago

Hi I'm having a similar error.

My code:

from ffcv.writer import DatasetWriter
from ffcv.fields import NDArrayField, FloatField, RGBImageField

writer = DatasetWriter(
    "data.beton",
    {
        "input_ids": NDArrayField(shape=(20,), dtype=np.dtype('uint8')),
        "attention_mask": NDArrayField(shape=(20,), dtype=np.dtype('uint8')),
        # Roughly 25% of the images will be stored in raw and the other in jpeg
        "pixel_values": RGBImageField(
            write_mode="proportion",  
            compress_probability=0.25,  
            max_resolution=(224, 224), 
            jpeg_quality=50,  
        ),
    },
    num_workers=20,
)

class CaptionDataset:
    def __init__(self):
        self.captions = p **# <- a pandas dataframe with 3 columns**

    def __getitem__(self, idx):
        inpt = self.captions['input_ids'][idx]   # example: [0, 1832, 27172, 161, 142555, 148, 184359, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
        at_mk = self.captions['attention_mask'][idx]   # example: [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        img = self.captions['pixel_values'][idx]       # an image of (3, 224, 224)
        return inpt, at_mk, img

    def __len__(self):
        return len(self.captions)

ds = CaptionDataset()
writer.from_indexed_dataset(ds)

The error:


Process Process-381:
Traceback (most recent call last):
  File "/anaconda3/envs/embedoor/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/anaconda3/envs/embedoor/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/anaconda3/envs/embedoor/lib/python3.8/site-packages/ffcv/writer.py", line 112, in worker_job_indexed_dataset
    handle_sample(sample, dest_ix, field_names, metadata, allocator, fields)
  File "/anaconda3/envs/embedoor/lib/python3.8/site-packages/ffcv/writer.py", line 50, in handle_sample
    field.encode(destination, field_value, allocator.malloc)
  File "/anaconda3/envs/embedoor/lib/python3.8/site-packages/ffcv/fields/ndarray.py", line 93, in encode
    data_region[:] = field.reshape(-1).view('<u1')
ValueError: could not broadcast input array from shape (80,) into shape (20,)

I have confirmed that all my examples are len = 20. Did I do something wrong? Thanks!

gaceladri commented 2 years ago

What i did wrong was that i was converting [0, 1832, 27172, 161, 142555, 148, 184359, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] from int to numpy '<u1', so it was transforming the vector to a this dtype. I changed the length of the tensor to 80 and it seems to work. i need to double check that the conversion is done properly. thanks!

andrewilyas commented 2 years ago

Assuming this is resolved (wrong dtype) and closing -- feel free to re-open a new issue if it's not resolved!

libffcv / ffcv

object detection dataset #217