Open charlesollion opened 4 years ago
@charlesollion I'm working on the ETL pipeline with Clément and was wondering if I understand correctly the AI response. For each trash, it gives :
{'frame_to_box': {'2': [0.43, 0.44, 0.49, 0.5]},
'id': 0,
'label': 'fragments'}
It also gives the fps and number of frame of the video. I thought 'frame_to_box' gave the index of the frame (here '2') and the location of the trash (the list with box boundaries). So, it seems we can indeed retrieve the timestamp of the trash, or ? I suggested the following function : https://github.com/surfriderfoundationeurope/etl/blob/dev/code-review/etl/utils/ai.py (if you have time t have a look and correct me if I'm wrong 🙏 )
Have a nice week-end !
Hello,
Thank you for your message. Indeed it is possible to retreive the timestamp from the frame number and the fps of the video. Your code seems good, but there seem to be one confusion between the boxes and the detected trash: you may (and it happens often) have a single detected trash that is found in mutliple consecutive frames (1 trash = several boxes), for instance:
{'detected_trash': [
{'frame_to_box': {'2': [0.43, 0.44, 0.49, 0.5], '3':[0.42, 0.44, 0.48, 0.5]},
'id': 0,
'label': 'fragments'}
}
I suggest attaching the timestamp to each trash, and not to each box. The timestamp of the trash should then be the average of the timestamps of the frames with boxes detected (in the example above, (timestamp of frame 2 + timestamp of frame 3)/2
).
Does it make sense to you? Other than that I think your code is clean. If it's meant to be located in the ETL and I'll close this issue as soon as we have the same vision!
Very clear, thank you ++ ! I'll copy-paste your answer, open an issue in ETL repo and implement that averaging over timestamp to be able to match a GPS coordinate. I was planing to keep also the AI box detection (to be able to check afterwards, to feed the label plateforme.. or actually, simply to take advantage of all the info the AI returns!). So I could indeed keep the average timestamp to match the GPS, and for box coordinates, I'd keep the ones from the frame in the 'middle' of the dict frame_to_box.
I'd suggest, for clarity purpose (but it's really a detail !) you could send frame_to_box as a list with keys frame
and box
, like:
{'detected_trash': [
{'frame_to_box': [{'frame': 2, 'box': [0.43, 0.44, 0.49, 0.5]}] , {'frame': 3, 'box': [0.42, 0.44, 0.48, 0.5]}],
'id': 0,
'label': 'fragments'}
}
You can close this issue, one of these implementations should do the work : https://github.com/surfriderfoundationeurope/etl/commit/62c1c758e6e9350b81d63f206d7032c25d79db3e Thank you for your help !
For each trash detected from a video, a new field should be added, including the elapsed number of seconds from the beginning of the video.