tonghe90 / textspotter

324 stars 112 forks source link

"gt_label" in tool_layers/gen_gts_layer #30

Closed fsluckymao closed 5 years ago

fsluckymao commented 5 years ago

Hi @tonghe90 : sorry for bother you again. i can understand almost all of your code, but i am really confuse about customize layer "gen_gts_layer", specially the bottom[0] blob "gt_bbox" whose shape is N 1 H * W, but i dont know about what excatly gt_bbox is and what is the mean of the vaule in gt_bbox.

https://github.com/tonghe90/textspotter/blob/0166abdbe68bfe0a416a4a1d35ab8d1e1fcfe262/pylayer/tool_layers.py#L304

for n in range(batch_size): gt_label = bottom[0].data[n, 0] #gt_label is a matrix,shape=HW tmp = np.sum(gt_label, axis=1) gt_num = len(np.where(tmp != 0)[0]) if gt_num == 0: continue roi_n = gt_label[:gt_num, :8] 4 #here i cant understand. roi_n = np.hstack((np.ones((gt_num, 1)) * n, roi_n)) gt_boxes = np.vstack((gt_boxes, roi_n))

crazysal commented 5 years ago

@tonghe90 , @fsluckymao Could you please provide an intuitive explanation of what the following variables are ?
'sample_gt_cont' - to keep track of number of boxes per image, to flush hidden state ? 'sample_gt_label_input' - stores previous word(char) predicted during test but what in train ? "sample_gt_label_output" - ?

How does the behavior change in train and test ? Specifically I understood the conted part but not the sample_gt_label_input and sample_gt_label_output. How have you exactly used them in the embed layer ?

wenston2006 commented 5 years ago

@fsluckymao 请问你知道文字识别部分的文字标签(我指的是英语字母)在哪里输入吗?另外文字识别的标签是否要从英语字母转换成数字如utf-8格式?gt_bbox貌似只包含了文本框的坐标?