Open sameerpande12 opened 4 years ago
The information is kind of dispersed in the issues, I will summarize it here for anyone looking in the future.
The features are extracted using the bottom up attention model from https://github.com/peteanderson80/bottom-up-attention.
You need to slightly modify the tools/generate_tsv.py to get the label.tsv and feature.tsv. The following code must be added to this file to create the exact format of featue.tsv
box_width = boxes[:, 2] - boxes[:, 0]
box_height = boxes[:, 3] - boxes[:, 1]
scaled_width = box_width / image_width
scaled_height = box_height / image_height
scaled_x = boxes[:, 0] / image_width
scaled_y = boxes[:, 1] / image_height
scaled_width = scaled_width[..., np.newaxis]
scaled_height = scaled_height[..., np.newaxis]
scaled_x = scaled_x[..., np.newaxis]
scaled_y = scaled_y[..., np.newaxis]
spatial_features = np.concatenate( (scaled_x, scaled_y, scaled_x + scaled_width, scaled_y + scaled_height, scaled_width, scaled_height), axis=1)
full_features = np.concatenate((features, spatial_features), axis=1)
fea_base64 = base64.b64encode(full_features).decode('utf-8')
fea_info = {'num_boxes': boxes.shape[0], 'feature': fea_base64}
row = [[image_key, json.dumps(fea_info)]
I am attaching the file that I used for this purpose and to generate label.tsv as well. You might have to change the code depending on your data location and format. tsv_gen.py.zip
I still had some issues with csv Dictwriter generating strings with single quote while json loads requiring it as double quotes in run_captioning.py. I made modifications to run_captioning.py to make it work. If you guys have a better solution, let me know.
Finally to generate label.lineidx and feature.lineidx, make use of the following function
Thanks !
@shravan1394, what is the command line you used to generate the caption after having the right features?
Also, could you share the modifications to run_captioning.py
to fix the problem with json loads?
The generated label.lineidx
and feature.lineidx
need to be in the same folder as custom.feature.tsv
and custom.label.tsv
, right?
The information is kind of dispersed in the issues, I will summarize it here for anyone looking in the future.
The features are extracted using the bottom up attention model from https://github.com/peteanderson80/bottom-up-attention. You need to slightly modify the tools/generate_tsv.py to get the label.tsv and feature.tsv. The following code must be added to this file to create the exact format of featue.tsv
box_width = boxes[:, 2] - boxes[:, 0]
box_height = boxes[:, 3] - boxes[:, 1]
scaled_width = box_width / image_width
scaled_height = box_height / image_height
scaled_x = boxes[:, 0] / image_width
scaled_y = boxes[:, 1] / image_height
scaled_width = scaled_width[..., np.newaxis]
scaled_height = scaled_height[..., np.newaxis]
scaled_x = scaled_x[..., np.newaxis]
scaled_y = scaled_y[..., np.newaxis]
spatial_features = np.concatenate( (scaled_x, scaled_y, scaled_x + scaled_width, scaled_y + scaled_height, scaled_width, scaled_height), axis=1)
full_features = np.concatenate((features, spatial_features), axis=1)
fea_base64 = base64.b64encode(full_features).decode('utf-8')
fea_info = {'num_boxes': boxes.shape[0], 'feature': fea_base64}
row = [[image_key, json.dumps(fea_info)]
I am attaching the file that I used for this purpose and to generate label.tsv as well. You might have to change the code depending on your data location and format. tsv_gen.py.zip
I still had some issues with csv Dictwriter generating strings with single quote while json loads requiring it as double quotes in run_captioning.py. I made modifications to run_captioning.py to make it work. If you guys have a better solution, let me know.
Finally to generate label.lineidx and feature.lineidx, make use of the following function
After using this script to generate feature and label tsv files, and after resolving the issue with single-quotes, I received the following error
JSONDecodeError: Expecting value: line 1 column 14 (char 13) error
I solved it by removing .decode('utf-8')
from base64.b64encode(full_features).decode('utf-8')
in the bottom-up-attention based extractor script
@EByrdS you can convert the single quotes to double quotes following https://github.com/microsoft/Oscar/issues/49#issuecomment-797675905 or https://github.com/microsoft/Oscar/issues/49#issuecomment-966316562
The information is kind of dispersed in the issues, I will summarize it here for anyone looking in the future.
The features are extracted using the bottom up attention model from https://github.com/peteanderson80/bottom-up-attention. You need to slightly modify the tools/generate_tsv.py to get the label.tsv and feature.tsv. The following code must be added to this file to create the exact format of featue.tsv
box_width = boxes[:, 2] - boxes[:, 0]
box_height = boxes[:, 3] - boxes[:, 1]
scaled_width = box_width / image_width
scaled_height = box_height / image_height
scaled_x = boxes[:, 0] / image_width
scaled_y = boxes[:, 1] / image_height
scaled_width = scaled_width[..., np.newaxis]
scaled_height = scaled_height[..., np.newaxis]
scaled_x = scaled_x[..., np.newaxis]
scaled_y = scaled_y[..., np.newaxis]
spatial_features = np.concatenate( (scaled_x, scaled_y, scaled_x + scaled_width, scaled_y + scaled_height, scaled_width, scaled_height), axis=1)
full_features = np.concatenate((features, spatial_features), axis=1)
fea_base64 = base64.b64encode(full_features).decode('utf-8')
fea_info = {'num_boxes': boxes.shape[0], 'feature': fea_base64}
row = [[image_key, json.dumps(fea_info)]
I am attaching the file that I used for this purpose and to generate label.tsv as well. You might have to change the code depending on your data location and format. tsv_gen.py.zip
I still had some issues with csv Dictwriter generating strings with single quote while json loads requiring it as double quotes in run_captioning.py. I made modifications to run_captioning.py to make it work. If you guys have a better solution, let me know.
Finally to generate label.lineidx and feature.lineidx, make use of the following function
Thanks for the summary of information here!
To anyone wishing to extract features on custom datasets, stumbled on this thread, and potentially struggling with the caffe environment, I'd recommend using the docker env built from the lxmert.
Follow the instructions to set up the environment, then rewrite the import
part of the script following this (at the top of the file).
Hi guys, I am trying to generate my own features.tsv and labels.tsv for my dataset, but I am stuck at the following:
I have a slight confusion regarding what exactly these features are. Upon reading the "Oscar" paper, I can understand that per bounding box a feature vector is of type (v',z) where v' is P-dimensional (2048) and z is 6 dimensional (position). I have a difficulty in understanding where do these 2048 features come from. Initially, I thought that these were from the FC-layer of Faster-R-CNN but upon checking the FC-layer size is 4096 in Faster-R-CNN.
The Oscar paper mentions, " Specifically, v and q are generated as follows. Given an image with K regions of objects (normally over-sampled and noisy), Faster R-CNN [28] is used to extract the visual semantics of each region". I have a slight confusion regarding how are these K-regions determined. Are these K-image regions the bound-boxes output by Faster-RCNN?
I am relatively new to this area. Any help would be appreciated.