rykov8 / ssd_keras

Port of Single Shot MultiBox Detector to Keras
MIT License
1.1k stars 553 forks source link

Can you explain the gt_pascal.pkl format? #15

Closed abduallahmohamed closed 7 years ago

abduallahmohamed commented 7 years ago

Hi,

Can you explain the gt_pascal.pkl format? and how is it formated from the pascal format :+1:

<?xml version="1.0" encoding="UTF-8" ?>
;;<annotations>
;;  <folder>/home/user/path-to-kitti-root/training/image/</folder>
;;  <filename>000000.png</filename>
;;  <size>
;;    <width>1224</width>
;;    <height>370</height>
;;    <depth>3</depth>
;;  </size>
;;  <object>
;;    <name>Pedestrian</name>
;;    <truncated>0</truncated>
;;    <occluded>0</occluded>
;;    <alpha>-0.20</alpha>
;;    <bndbox>
;;      <xmin>712.40</xmin>
;;      <ymin>143.00</ymin>
;;      <xmax>810.73</xmax>
;;      <ymax>307.92</ymax>
;;    </bndbox>
;;    <dimensions>
;;      <height>1.89</height>
;;      <width>0.48</width>
;;      <length>1.20</length>
;;    </dimensions>
;;    <location>
;;      <x>1.84</x>
;;      <y>1.47</y>
;;      <z>8.41</z>
;;    </location>
;;    <rotation_y>0.01</rotation_y>
;;    <property>-0.20,0.00,0</property>
;;  </object>
;;  <object>
;;  .
;;  .
;;  .
;;  </object>
;; <object>
;;  .
;;  .
;;  .
;;  </object>
;;</annotations> 
rykov8 commented 7 years ago

gt_pascal.pkl is quite similar to PASCAL format and is used by me just because I hate parsing xml. In this file you can see, that for each image gt is a list (probably an empty one), each element of this list looks like [xmin, ymin, xmax, ymax, prob1, prob2, prob3], xmin etc are in relative coordinates. Here I assume, that first 4 numbers are coordinates and others are one-hot encoded classes (excluding background for ground truth). Actually, you need just to parse xml file with annotations for an image, for each <object> node get its' class and bounding box coordinates, divide each x and y coordinate by image width and height respectively, one-hot encode object's class, concat obtained vectors of coordinates and one-hot encoded classes and add obtained converted bounding box to a list of image's objects. You can do it in generator, without preprocessing gt before training.

abduallahmohamed commented 7 years ago

Can i have a code for what you described, I've been stuck for 2 days in this step ?

rykov8 commented 7 years ago

I don't have code for parsing PASCAL xml. If you provide some snippet, that does parsing for your case, I can add some lines into it in order to show, how to convert PASCAL ground truth into expected one (but it is quite straightforward procedure).

abduallahmohamed commented 7 years ago

Ok, one more question: when I printed an item from the dictionary print(gt['frame05183.png']) this was the result: [[ 0.54921875 0.36527778 0.78828125 0.61805556 1. 0. 0. ]] but your code says number of classes = 4

# some constants
NUM_CLASSES = 4

so, based on your explanation, I have[xmin, ymin, xmax, ymax, prob1, prob2, prob3] but there's 4 classes, so it should be [xmin, ymin, xmax, ymax, prob1, prob2, prob3, prob4]

am I correct ?

rykov8 commented 7 years ago

Here I assume, that first 4 numbers are coordinates and others are one-hot encoded classes (excluding background for ground truth).

So, in ground truth you don't need to worry about special background class, it is added automatically during bounding box to priors assignment. However, in order to construct the net correctly, I pass number of classes + 1 as NUM_CLASSES (e.g. for PASCAL07 it is 20 + 1). Yes, it looks kind of strange, but with this auxiliary background class we are doomed to have some problems with the simplicity.

abduallahmohamed commented 7 years ago

Many Thanks !!

Walid-Ahmed commented 7 years ago

@abduallahadel Can you please advice how to view image files mentioned in gt_pascal.pkl ? this is what I did so far

import pickle
f = open("gt_pascal.pkl")
data = pickle.load(f)
print(data.get['frame05183.png'])

where is actually file 'frame05183.png' stored?

Iflier commented 7 years ago

The author has answered the questions,it is NDA.

Walid-Ahmed commented 7 years ago

@Iflier Can you please explain what do you mean with NDA?

nixingyang commented 7 years ago

@rykov8 Regarding this reply "In this file you can see, that for each image gt is a list (probably an empty one)". Is it possible to have a empty list for some training samples since those samples do not contain any valid objects inside? For me, the code crashed because of those empty lists.

Iflier commented 7 years ago

@Walid-Ahmed default. It may have some helps.

Iflier commented 7 years ago

@Walid-Ahmed HI, the means of NDA is : Non-disclosure agreement.Maybe the frames included in ../../frames, inconvenient public, I think.

Iflier commented 7 years ago

@Walid-Ahmed If you want to train your own data set, you should write your own .pkl file. The content of the .pkl file contains the name of the image (not the absolute path of the image(s)), and then the coordinates of the object(s) in the image,eg: (x/img_width, y/img_height, (x+w)/img_width, (y+h)/img_height).

DanMossa commented 7 years ago

So an example of a .pkl file would be

totalImages/52472.jpg,255,165,38,34,emptyRock

255 and 165 being x1 and y1

and 38 being x2 - x1 and 34 being y2 - y1

? is that it? how do I convert a txt file to a pkl file?

Mahtsentu commented 6 years ago

@rykov8 : PLEASE help me understand your SSD- Keras repo codes line by line so that I can pay you

sqiprasanna commented 4 years ago

@rykov8 Can you help me with the code to generate gt_pascal.pkl file for my own dataset?