gurkirt / realtime-action-detection

This repository host the code for real-time action detection paper
Other
319 stars 97 forks source link

New dataset #11

Closed jack99541008 closed 6 years ago

jack99541008 commented 6 years ago

Hello @gurkirt , thank for your code. I have few question about the code. 1. What is stored in pyannot.pkl on ucf24 dataset? I saw the dictionaries but don't really know what its meaning. 2. How to use my own dataset instead of ucf24 to run the code you gave ? My dataset includes few folders and a lot of pictures in each folder.

zhanghaoinf commented 6 years ago

import pickle
f = open('pyannot.pkl','rb') data = pickle.load(f)

data: a dict, with relative videopath as key. Example key: 'BasketballDunk/v_BasketballDunk_g01_c03'

data[videopath]['label'] # categroy index, from 0 ~ 23 for UCF24 (int) data[videopath]['numf'] # number of frames in the video. (numpy,uint32) data[videopath]['annotations'][Tube_INDEX]['sf'] # tube start-frame, 0 denotes 1-st frame (int) data[videopath]['annotations'][Tube_INDEX]['ef'] # tube end-frame (my understanding: "ef" is not included), maxmum 'numf' (int) data[videopath]['annotations'][Tube_INDEX]['label'] # tube label (int) data[videopath]['annotations'][Tube_INDEX]['boxes'] # tube boxes, len(boxes) should be the same with ef - sf

Tube_INDEX denotes index number of a tube intance in a video

xmin = box[0] xmax = box[0] + box[2] ymin = box[1] ymax = box[1] + box[3]

xmin and xmax both are in range [1, 320], (width = 320) ymin and ymax both are in range [1, 240], (height = 240)

I am not sure whether the *** part is correct ? @gurkirt

gurkirt commented 6 years ago

@jack99541008

  1. it doesn't have to be in this format. You can have any format you like. All you need to understand is how pytorch's dataset class work. You need to implement getitem call for any new dataset rest is upto you.
  2. you might want to look at folder dataset class in pytorch.
gurkirt commented 6 years ago

@zhanghaoinf I don't understand your question completely. What do you mean by ***? But, yes, xmin and xmax both are in range [1, 320], (width = 320) .....

zhanghaoinf commented 6 years ago

Hi, @gurkirt , I found when a tube span in a whole video duration, 'sf' is set to 0 and 'ef' is set to 'numf''.
If 'ef' is included, frame indexes will be [0,1,2,..., numf ], resulting indexing a non-existing frame. I guess that 'ef' is not included in a tube, and frame indexes is [0, 1, 2, ..., ef-1].

gurkirt commented 6 years ago

yes, python index = range(numf) does that for you.