Open royvelich opened 2 months ago
Can I try to fix this? Or is it too complicated? @amyeroberts
@royvelich Of course! Please feel free to open a PR to fix, ping me when ready for review and feel free to as any q's in the meantime.
@amyeroberts Sure, I'll work on it. Can I ask questions in this thread if needed?
@amyeroberts what should be the output format for this? The variable input_boxes
returned by the processor is currently a tensor, hence difficult to pack elements with different shapes inside it. I see two options: returning a list of tensors instead of a single tensor or returning a padded version of the tensor and a corresponding mask. What do you think about it?
@RaphaelMeudec In most of our other models, we process bounding boxes as "labels" which are a list of length batch_size
and each element of the list is a BatchFeature
. The other alternative is creating a tensor of (batch_size, max_num_boxes, 4)
and then correctly masking / filtering the empty annotations when passed to the library. SAM is quite unusual in its API, so I think we can choose either. In both cases, we'll have to account for backwards compatibility and making sure the model can correctly handle the newly formatted input.
cc @yonigozlan
System Info
transformers
version: 4.43.3Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Run the following code:
You should get the following error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Originating from:
transformers\models\sam\processing_sam.py
(line 142)Expected behavior
As an end-user, I expect to get 2 masks/results for the first image and 1 mask/result for the second image.