tyiannak / multimodal_movie_analysis

A Python Library for Multimodal Analysis of Movies and Content-based Movie Recommendation
25 stars 8 forks source link

Object detection as features #6 #22

Closed pakoromilas closed 3 years ago

pakoromilas commented 3 years ago

Object detection:

Code Refactoring:

tyiannak commented 3 years ago

Getting this error

raise RuntimeError('Attempting to deserialize object on a CUDA '

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

tyiannak commented 3 years ago
  1. why is frame area fixed (frame_area = 300 * 300 in detection_utils)
  2. also please add object-related features both in the final feature matrix and the feature statistics (so we need to keep both frame-level object features and final statistics)
  3. please add the feature names in the process_video (as a fourth returned variable). This should be a list of strings of the same length with the feature arrays
pakoromilas commented 3 years ago
1. why is frame area fixed (frame_area = 300 * 300 in detection_utils)

2. also please add object-related features both in the final feature matrix and the feature statistics (so we need to keep both frame-level object features and final statistics)

3. please add the feature names in the process_video (as a fourth returned variable). This should be a list of strings of the same length with the feature arrays
  1. The Nvidia SSD model works only for frames of this size. Every time I work with it I transform the frame and do the necessary calculations.

  2. I'm a bit confused here. The object features are represented throw stats across frames. So it is reasonable to include these features on the feature_stats vector. On the other hand, the feature matrix represents features for every frame. In order to include some object features to the feature matrix I will probably have to introduce a new calculation for every frame. For example, at fifth frame there are 2 persons with average confidence of 0.9 and box area ratio of 0.6. Is this the right way to add these features to the feature matrix?

tyiannak commented 3 years ago
1. why is frame area fixed (frame_area = 300 * 300 in detection_utils)

2. also please add object-related features both in the final feature matrix and the feature statistics (so we need to keep both frame-level object features and final statistics)

3. please add the feature names in the process_video (as a fourth returned variable). This should be a list of strings of the same length with the feature arrays
  1. The Nvidia SSD model works only for frames of this size. Every time I work with it I transform the frame and do the necessary calculations.

  2. I'm a bit confused here. The object features are represented throw stats across frames. So it is reasonable to include these features on the feature_stats vector. On the other hand, the feature matrix represents features for every frame. In order to include some object features to the feature matrix I will probably have to introduce a new calculation for every frame. For example, at fifth frame there are 2 persons with average confidence of 0.9 and box area ratio of 0.6. Is this the right way to add these features to the feature matrix?

  1. ok
  2. yes lets add a new per-frame calculation as u described it. For the time being lets keep it aggregated per frame (i.e. if there are two faces you aggregate the confidences and the area as their average per frame. The count is still a number ofcourse ).
pakoromilas commented 3 years ago

I made the proposed changes. Please check if everything works fine.

tyiannak commented 3 years ago

great approving and mergin @lobracost