Closed firsakov closed 2 years ago
Hi @firsakov ! Can you provide a larger code sample that shows what your dataset is returning?
@andrewilyas this is the error
File "/opt/conda/envs/ffcv/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/envs/ffcv/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/ffcv/lib/python3.9/site-packages/ffcv/writer.py", line 112, in worker_job_indexed_dataset
handle_sample(sample, dest_ix, field_names, metadata, allocator, fields)
File "/opt/conda/envs/ffcv/lib/python3.9/site-packages/ffcv/writer.py", line 50, in handle_sample
field.encode(destination, field_value, allocator.malloc)
File "/opt/conda/envs/ffcv/lib/python3.9/site-packages/ffcv/fields/ndarray.py", line 93, in encode
data_region[:] = field.reshape(-1).view('<u1')
ValueError: could not broadcast input array from shape (60,) into shape (400,)
and this is code
writer = DatasetWriter(write_path, {
'image': RGBImageField(),
'label': NDArrayField(shape=(15, 5), dtype=np.dtype('float32')),
})
writer.from_indexed_dataset(dataset)
I tried to pass number of bboxes for a particaular sample as IntField but it didn't help or I just didnt figure out what to do next..
Hi @firsakov , thanks for the info---can you also provide the definition of dataset
in your code above?
Hi I'm having a similar error.
My code:
from ffcv.writer import DatasetWriter
from ffcv.fields import NDArrayField, FloatField, RGBImageField
writer = DatasetWriter(
"data.beton",
{
"input_ids": NDArrayField(shape=(20,), dtype=np.dtype('uint8')),
"attention_mask": NDArrayField(shape=(20,), dtype=np.dtype('uint8')),
# Roughly 25% of the images will be stored in raw and the other in jpeg
"pixel_values": RGBImageField(
write_mode="proportion",
compress_probability=0.25,
max_resolution=(224, 224),
jpeg_quality=50,
),
},
num_workers=20,
)
class CaptionDataset:
def __init__(self):
self.captions = p **# <- a pandas dataframe with 3 columns**
def __getitem__(self, idx):
inpt = self.captions['input_ids'][idx] # example: [0, 1832, 27172, 161, 142555, 148, 184359, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
at_mk = self.captions['attention_mask'][idx] # example: [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
img = self.captions['pixel_values'][idx] # an image of (3, 224, 224)
return inpt, at_mk, img
def __len__(self):
return len(self.captions)
ds = CaptionDataset()
writer.from_indexed_dataset(ds)
The error:
Process Process-381:
Traceback (most recent call last):
File "/anaconda3/envs/embedoor/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/anaconda3/envs/embedoor/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/anaconda3/envs/embedoor/lib/python3.8/site-packages/ffcv/writer.py", line 112, in worker_job_indexed_dataset
handle_sample(sample, dest_ix, field_names, metadata, allocator, fields)
File "/anaconda3/envs/embedoor/lib/python3.8/site-packages/ffcv/writer.py", line 50, in handle_sample
field.encode(destination, field_value, allocator.malloc)
File "/anaconda3/envs/embedoor/lib/python3.8/site-packages/ffcv/fields/ndarray.py", line 93, in encode
data_region[:] = field.reshape(-1).view('<u1')
ValueError: could not broadcast input array from shape (80,) into shape (20,)
I have confirmed that all my examples are len = 20. Did I do something wrong? Thanks!
What i did wrong was that i was converting [0, 1832, 27172, 161, 142555, 148, 184359, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
from int to numpy '<u1', so it was transforming the vector to a this dtype. I changed the length of the tensor to 80 and it seems to work. i need to double check that the conversion is done properly. thanks!
Assuming this is resolved (wrong dtype) and closing -- feel free to re-open a new issue if it's not resolved!
Hello!
I'm trying to use FFCV with a dataset for object detection, each label has different numbers of bouding boxes. I use NDArrayFied of size (15, 5) (15 - maximum number of bboxes per image, 5 - class label + 4 coordinates of bbox) and get the following error
Could you help with that please?