wangxiang1230 / OadTR

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".
MIT License
87 stars 12 forks source link

HDD camera dataset #24

Closed pgupta119 closed 1 year ago

pgupta119 commented 1 year ago

Hi Team, I would like to use both camera and sensor data on transformer. Could you please suggestion to use camera data and sensor both into transformer? I also like to compare with TRN HDD matrix (mAP).

Thanks a lot

wangxiang1230 commented 1 year ago

Hi Team, I would like to use both camera and sensor data on transformer. Could you please suggestion to use camera data and sensor both into transformer? I also like to compare with TRN HDD matrix (mAP).

Thanks a lot

Hi, sorry for the late reply. For HDD, I suggest that you can refer to the practice on the THUMOS14 dataset, directly concatenate the RGB features and flow features, and then input them into the network. That is, you can try to extract features for the camera and sensor data separately, concatenate the features of both, and then feed them into the online action detection network.

pgupta119 commented 1 year ago

Hi Team, I would like to use both camera and sensor data on transformer. Could you please suggestion to use camera data and sensor both into transformer? I also like to compare with TRN HDD matrix (mAP). Thanks a lot

Hi, sorry for the late reply. For HDD, I suggest that you can refer to the practice on the THUMOS14 dataset, directly concatenate the RGB features and flow features, and then input them into the network. That is, you can try to extract features for the camera and sensor data separately, concatenate the features of both, and then feed them into the online action detection network.

Thanks @wangxiang1230 for the suggestions, I would like to use the densenet121 pre-trained model to extract the features.

pgupta119 commented 1 year ago

HI @wangxiang1230 ,

I am checking the hdd_all_anno.pickle and my understanding is each session has an array which is a shape of (L,12) : L is the total number of images per session and 12 is the number of classes. If I am correct all the session has the same value of 1 at the same index ( which means each one has 1 at beginning of the rows). So is it my understanding right? If so then I checked the original dataset from Honda where the target data provide the information of the class for each image. For example : first 10 images the array of the target data is. [0 ,1,3,4,0,0,0,0,2,1] : 2,4,1,0 represent the class And When I check in the *hdd_all_anno.pickle** the array is like that. [1,0, 0, 0 ,0 .....] :(length of the array is 12 [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...] [1,0,0,0,0...]

Coudl you please explain it ?

wangxiang1230 commented 1 year ago

Hi, each session has an array, which is a shape of (L,12). L means the total number of the sensor data. Sensor data and images are extracted by different FPS. I seem to remember that the image is extracted at 3FPS while the sensor is not. Therefore, if you use image data, you need to convert the length of the label.

pgupta119 commented 1 year ago

Hi, I have checked this https://zenodo.org/record/5513130#.YzIMF-zP1hF and try to understand the 'hdd_all_anno.pickle' file and I checked the array of each session (senor data) its look like [[1. 0. 0. ... 0. 0. 0.] (1X12) [1. 0. 0. ... 0. 0. 0.] (1X12) [1. 0. 0. ... 0. 0. 0.] ... [1. 0. 0. ... 0. 0. 0.] [1. 0. 0. ... 0. 0. 0.] [1. 0. 0. ... 0. 0. 0.]] l is no of rows = (LX12)

(L,12), So my question is for all the session the first index value is 1 and the other are zero which implies that it belongs to class one but for all the session has the same class 1 only I could not find any other class in the (L,12) .even in all the session(137). Is it I am understanding wrong ( because the value at the other index should be 1 for rach rows so it could represent the other class).

For example: [[0. 0. 0. ... 1. 0. 0.] [0. 0. 0. ... 1. 0. 0.] [0. 0. 1. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 1.] [1. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 1. 0. 0.]]

wangxiang1230 commented 1 year ago

Hi, I have checked this https://zenodo.org/record/5513130#.YzIMF-zP1hF and try to understand the 'hdd_all_anno.pickle' file and I checked the array of each session (senor data) its look like [[1. 0. 0. ... 0. 0. 0.] (1X12) [1. 0. 0. ... 0. 0. 0.] (1X12) [1. 0. 0. ... 0. 0. 0.] ... [1. 0. 0. ... 0. 0. 0.] [1. 0. 0. ... 0. 0. 0.] [1. 0. 0. ... 0. 0. 0.]] l is no of rows = (LX12)

(L,12), So my question is for all the session the first index value is 1 and the other are zero which implies that it belongs to class one but for all the session has the same class 1 only I could not find any other class in the (L,12) .even in all the session(137). Is it I am understanding wrong ( because the value at the other index should be 1 for rach rows so it could represent the other class).

For example: [[0. 0. 0. ... 1. 0. 0.] [0. 0. 0. ... 1. 0. 0.] [0. 0. 1. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 1.] [1. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 1. 0. 0.]]

Frame-level goal-oriented driver behaviors are provided. They are parsed from ".eaf" files, which can be found in "release_2019_07_08.tar.gz." The following is the corresponding 12 categories.

  1. Background
  2. intersection passing
  3. left turn
  4. right turn
  5. left lane change
  6. right lane change
  7. left lane branch
  8. right lane branch
  9. crosswalk passing
  10. railroad passing
  11. merge
  12. U-turn

So, [1. 0. 0. ... 0. 0. 0.] means "1. Background"

pgupta119 commented 1 year ago

Thanks @wangxiang1230 , could please share the script of the Annotation of the HDD dataset?

pgupta119 commented 1 year ago

I did it. Thanks