Pre-processing video frames

jeffmur commented 6 months ago

Similarity Score Representation

Compute Normalized Byte Array from video frames from ppvs

OR

Find alternative method(s) to preprocess video frames for SSO evaluation

Motion Estimation - https://docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html
- Dart: OpticalFlowFarneback

BrentLagesse commented 6 months ago

Motion estimation could be interesting to explore. The reason we used the byte array is because in my original work that this was based on, https://faculty.washington.edu/lagesse/publications/SSO.pdf didn't have access to the video itself.

For computing the normalized score, it is just Array[i]/Sum(Array), so if they byte counts were

[5, 10, 20, 5, 10] the normalized version would be [0.1, 0.2, 0.4, 0.1, 0.2]

The reason we do this is because if one camera is 480i resolution and another is 4k, their byte counts will be drastically different due to resolution differences, so this normalizes each of them. Algorithms like Pearson's Correlation Coefficient do this automatically when you use them, so we just do it by hand since we can't use the existing libraries under FHE.

jeffmur commented 5 months ago

For the byte-count array, is it the count of bytes in each frame (of 1 second duration)?

Normalizing each frame For a 60 fps video, [ Frame 1, Frame 2, ... , Frame 60 ] For a 30 fps video, [ Frame 1, Frame 2, ... , Frame 30 ]

For each frame, we could the number of bytes contained within the frame. Example Lengths:

Frame 1: 2,438,769
Frame 2: 1,829,967

So normalizing each frame within duration,

60 fps : [ 0.9, 0.5, .... , 0.1 ] with 60 elements
30 fps : [ 0.9, 0.5, ... , 0.5] with 30 elements

In this case, how do we compare unequal length arrays?

BrentLagesse commented 5 months ago

It's not per frame, it's per second (or whatever bucket size you want to use), so you would grab the size of the first 60 frames in a 60 fps video, sum their bytes to get the total bytes for that bucket.

For your example, the sum of the values in all the normalized buckets in the array should sum to 1.0 because it is each bucket divided by the sum of all the buckets in the array.

For your last question, this is the difference between a thesis and a project -- for a thesis, we get to assume that we can align the videos timing because our purpose is to show that our new algorithm works. For the project, you have to figure out a good way to align the videos. I suggested that you make the assumption that the clocks on the phone are generally synced (which is usually true as modern cell phones do clock syncs) and then grab the Time Created to manually align the videos, then compare the X amount of video that overlaps between to two (where X is the smaller size of the videos, unless the smaller size goes outside the duration of the bigger one, then it would be the overlapping part). This might help for dart -- https://stackoverflow.com/questions/61083506/dart-how-can-i-get-the-creationtime-of-a-file.

jeffmur commented 5 months ago

Thank you. This answered my question.

A bucket can contain N number of frames to satisfy the duration. This means we will need to compare variable sized arrays. Because we're dealing with timestamped arrays, we should be able to retrieve the "largest" slice between the comparators and compute their similarity.

The approach I've implemented is to request the time of which the video was taken as input. Example below.

This allows us to more easily test the implementation logic (largest slice approach)
Dart cannot trust the created date time across multiple platforms, and after some testing proved to be inconsistent between Linux + Android.

Screenshot from 2024-04-30 14-15-34

jeffmur / fhe-video-similarity

Pre-processing video frames #1