Open wander1985 opened 5 years ago
How does your GTUtility
class look like?
You may also want to read the other dataset related issues here...
The following is the code of my GTUtility class. I tried to modify the code based on your reply for another issue, https://github.com/mvoelk/ssd_detectors/issues/12#issuecomment-485686377.
class GTUtility(BaseGTUtility):
def __init__(self, data_path, polygon=True):
self.data_path = data_path
self.image_path = os.path.join(data_path, 'JPEGImages')
self.gt_path = gt_path = os.path.join(self.data_path, 'Annotations')
self.classes = ['Background', 'Text']
classes_lower = [s.lower() for s in self.classes]
self.image_names = []
self.data = []
for filename in os.listdir(gt_path):
tree = ElementTree.parse(os.path.join(gt_path, filename))
root = tree.getroot()
boxes = []
size_tree = root.find('size')
img_width = float(size_tree.find('width').text)
img_height = float(size_tree.find('height').text)
image_name = root.find('filename').text
for object_tree in root.findall('object'):
class_name = object_tree.find('name').text
class_idx = classes_lower.index(class_name)
for box in object_tree.iter('bndbox'):
xmin = float(box.find('xmin').text) / img_width
ymin = float(box.find('ymin').text) / img_height
xmax = float(box.find('xmax').text) / img_width
ymax = float(box.find('ymax').text) / img_height
if polygon:
box = [xmin, ymin, xmin, ymax, xmax, ymax, xmax, ymin, 1]
else:
box = [xmin, ymin, xmax, ymax, 1]
boxes.append(box)
boxes = np.asarray(boxes)
self.image_names.append(image_name)
self.data.append(boxes)
self.init()
if __name__ == '__main__':
gt_util = GTUtility('data/VOC2007')
print(gt_util.classes)
gt = gt_util.data
print(gt)
import pickle
file_name = 'gt_util_voc2007.pkl'
print('save to %s...' % file_name)
pickle.dump(gt_util, open(file_name, 'wb'))
print('done')
inputs, data = gt_util.sample_batch(1, 0)
What are the shapes you get?
The inputs.shape is (1, 512, 512, 3). The data is [array([[0.2275 , 0.44125, 0.2275 , 0.55875, 0.77125, 0.55875, 0.77125, 0.44125, 1. ]])]. data.shape raises AttributeError: 'list' object has no attribute 'shape'.
Seems okay... Does it work with larger batch size and more samples?
What is the shape of the model input? Should be (None, 512, 512, 3).
I changed batch size to 2 with 100 samples, but still got similar invalid argument error:
InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: training_4/Adam/gradients/loss_4/predictions_loss/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _class=["loc:@train...rseToDense"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss_4/predictions_loss/TopKV2:1, training_4/Adam/gradients/loss_4/predictions_loss/TopKV2_grad/stack)]]
How to get the shape of the model input? I'm sorry, I am a newbie to this field.
model.input_shape
A piece of code would also be helpful.
Thanks. I used
print(model.input_shape)
below
# SegLink + DenseNet
model = DSODSL512()
The shape is (None, 512, 512, 3). It seems as it should be.
It probably has something to do with the negative samples in the hard negative mining of the SegLinkLoss
. You may not get any negative samples in the local ground truth at all. Did you tried SegLinkFocalLoss
?
With code I meant, what does your SL_train.ipynb look like?
I changed the SegLinkLoss to
loss = SegLinkFocalLoss(lambda_segments=1.0, lambda_offsets=1.0, lambda_links=1.0)
and it works as magic. Thanks so much.
But why I didn't get negative samples in the local ground truth? Is it because I didn't label my ground truth correctly? The following is how I label my ground truth using LabelImg. Am I doing something wrong? For example, I have an image below needs to be labeled, then I give all the text ("Campus" and "Shop" in this example) the same class name "text" (as shown in the screenshot below). But I am wondering where to label the exact letters in the text (i.e. "Campus" and "Shop")? Or I don't need to label the exact letters at all?
The sequence label is placed in the text
attribute of the GTUtility
and is only required for the recognition stage. For more details, see data_svt.py
.
Hi, I am using SL_train.ipynb to train with my own VOC format dataset on Windows10. I used LabelImg to label the groundtruth annotation, and used data_voc.py to generate the pickle file. I've only used 5 images (3 for training, 1 for val, 1 for test). I set the batch size to 1. But the training process kept raising the following InvalidArgumentError after passing through the first image. Can you help? Thanks.