len of npy is different from len of video

wanduoz commented 1 year ago

Hello, I downloaded 50 salads npy files and label txt files. The length of feature matched with label. I also downloaded official 50 salads dataset(https://cvip.computing.dundee.ac.uk/datasets/foodpreparation/50salads/). I checked the frame number of each video using belowing code. It turned out that the length of feature you provided was 8 frames less than video frame number, for all videos. Did you directly abandoned the first 8 frames or the last 8 frames?

path = r'.\data\50salad\rgb'
path2 = r'.\50salads\features'

def video_info(file):
    video_capture = cv2.VideoCapture(file)

    if not video_capture.isOpened():
        exit()

    frame_count = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))

    frame_width = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))

    video_capture.release()
    return frame_count, frame_width, frame_height

file_list = os.listdir(path2)
for file in file_list:
    npy = np.load(os.path.join(path2, file+'.npy'))
    video = video_info(os.path.join(path, file+'.avi'))
    print(f"{file}, npy shape{npy.shape}, video info{video}")

output of above code are as following

rgb-01-1, npy shape(2048, 11679), video info(11687, 640, 480)
rgb-01-2, npy shape(2048, 12585), video info(12593, 640, 480)
rgb-02-1, npy shape(2048, 12415), video info(12423, 640, 480)
rgb-02-2, npy shape(2048, 10521), video info(10529, 640, 480)
rgb-03-1, npy shape(2048, 11373), video info(11381, 640, 480)
rgb-03-2, npy shape(2048, 11584), video info(11592, 640, 480)
rgb-04-1, npy shape(2048, 13168), video info(13176, 640, 480)
rgb-04-2, npy shape(2048, 12371), video info(12379, 640, 480)
rgb-05-1, npy shape(2048, 11115), video info(11123, 640, 480)
rgb-05-2, npy shape(2048, 12092), video info(12100, 640, 480)
rgb-06-1, npy shape(2048, 9694), video info(9702, 640, 480)
rgb-06-2, npy shape(2048, 8229), video info(8237, 640, 480)
rgb-07-1, npy shape(2048, 17793), video info(17801, 640, 480)
rgb-07-2, npy shape(2048, 15091), video info(15099, 640, 480)
rgb-09-1, npy shape(2048, 11547), video info(11555, 640, 480)
rgb-09-2, npy shape(2048, 14290), video info(14298, 640, 480)
rgb-10-1, npy shape(2048, 12329), video info(12337, 640, 480)
rgb-10-2, npy shape(2048, 9094), video info(9102, 640, 480)
rgb-11-1, npy shape(2048, 9435), video info(9443, 640, 480)
rgb-11-2, npy shape(2048, 8453), video info(8461, 640, 480)
rgb-13-1, npy shape(2048, 13880), video info(13888, 640, 480)
rgb-13-2, npy shape(2048, 13092), video info(13100, 640, 480)
rgb-14-1, npy shape(2048, 8599), video info(8607, 640, 480)
rgb-14-2, npy shape(2048, 8225), video info(8233, 640, 480)
rgb-15-1, npy shape(2048, 11489), video info(11497, 640, 480)
rgb-15-2, npy shape(2048, 13961), video info(13969, 640, 480)
rgb-16-1, npy shape(2048, 12865), video info(12873, 640, 480)
rgb-16-2, npy shape(2048, 10421), video info(10429, 640, 480)
rgb-17-1, npy shape(2048, 11146), video info(11154, 640, 480)
rgb-17-2, npy shape(2048, 12943), video info(12951, 640, 480)
rgb-18-1, npy shape(2048, 12077), video info(12085, 640, 480)
rgb-18-2, npy shape(2048, 7555), video info(7563, 640, 480)
rgb-19-1, npy shape(2048, 12817), video info(12825, 640, 480)
rgb-19-2, npy shape(2048, 11658), video info(11666, 640, 480)
rgb-20-1, npy shape(2048, 8488), video info(8496, 640, 480)
rgb-20-2, npy shape(2048, 8291), video info(8299, 640, 480)
rgb-21-1, npy shape(2048, 12912), video info(12920, 640, 480)
rgb-21-2, npy shape(2048, 12032), video info(12040, 640, 480)
rgb-22-1, npy shape(2048, 18143), video info(18151, 640, 480)
rgb-22-2, npy shape(2048, 12456), video info(12464, 640, 480)
rgb-23-1, npy shape(2048, 13274), video info(13282, 640, 480)
rgb-23-2, npy shape(2048, 14631), video info(14639, 640, 480)
rgb-24-1, npy shape(2048, 12211), video info(12219, 640, 480)
rgb-24-2, npy shape(2048, 7804), video info(7812, 640, 480)
rgb-25-1, npy shape(2048, 11159), video info(11167, 640, 480)
rgb-25-2, npy shape(2048, 8364), video info(8372, 640, 480)
rgb-26-1, npy shape(2048, 9126), video info(9134, 640, 480)
rgb-26-2, npy shape(2048, 9219), video info(9227, 640, 480)
rgb-27-1, npy shape(2048, 11859), video info(11867, 640, 480)
rgb-27-2, npy shape(2048, 12040), video info(12048, 640, 480)

julialromero commented 11 months ago

Hi @wanduoz, I found this same issue. Did you figure out how the 8 frames were dropped?

wanduoz commented 11 months ago

Hi @wanduoz, I found this same issue. Did you figure out how the 8 frames were dropped?

Hi @julialromero , I didn't know how to deal with this mismatching (some features are longer than corresponding videos while others are the opposite). I just cut or padded the label files so that features and labels had same length. I tried to process the official label files downloaded from here. But I didn't how to conver the timestamp label into per frame label.

julialromero commented 11 months ago

@wanduoz I managed to figure it out! I compared the original labels from downloading the 50Salads dataset to the labels they provided. There were strange mismatches between the two. For many of the sessions exactly 8 frames were dropped, and for other sessions more frames were dropped (when the session data extended far past when the salad actions were completed). I believe that they modified the original annotation and also all of the frames that were dropped were removed from the end.

I am trying to sync the accelerometer data with these visual features, so I am effectively just dropping frames from the end of the accelerometer data (and resampling to match the sampling rate) to get the same length, and I am using the visual features and labels that they provided.

They did some preprocessing on the annotations, so their labels are different from the original 50Salads annotations. I didn't see this documented anywhere, but noticed it when I compared the two: Originally the "null" class (the "other" class for actions that didn't fit into the predefined action class) was 17. They changed these labels at the beginning and end of the session into "action_start" null class (17) and "action_end" null class (18). As for all of the null sections that were not on the ends of the session, they combined these instances with the preceding non-null class. So, every time that there was a gap between the action classes, these gap frames were reassigned to the preceding class.

See this figure that I drew (attached) for a side-by-side comparison between the labels from the Original 50Salads dataset and the labels provided with these visual features (for Session 01-1). comparison_with_marker .

Hopefully that helps!

wanduoz commented 11 months ago

@julialromero Thank you so much. I still have serveral questions.

1."so I am effectively just dropping frames from the end of the accelerometer data (and resampling to match the sampling rate) to get the same length". Did you mean you downloaded 50 salads accelerometers data files? I opened 01-1-accelerometer.csv and I found it only recorded accelerometer data of kitchen objects. There were 27575 rows in this file. And there were 11687 frames in 01-1 label file.

How did you obtained the original labels? Did you traverse each row in label file and determine whether the timestamp value fell within the interval timestamp? For example in timestamps-01-1 file, the timestamp of row one is 32101164. It could not fell in any timestamp interval in 01-1-activityAnnotation file. So you gave a class 17 label(null class).
How did you combine the label? According to 50 Salads website, it had both high-lv activity and low-lv activity label. However, the label of tcn project only contained a part of aforementioned label.

4.Can you provide your code so that I can gradually debug the code step by step?

By the way, I was stuck with 50 salads label data. I am more interested in obtaining features in order to apply tcn in other scenario.

julialromero commented 11 months ago

@wanduoz

Yes I downloaded the raw 50 Salads accelerometer data and yes some of these accelerometer files are much longer than the labels. That is because the data collection from the sensors extended past the labelled actions and it seems like they did not drop this extra data from the raw dataset.
and 3. I used this code to preprocess the accelerometer data and the original annotations. (I commented out these 3 lines)
For code, I basically used that linked code to preprocess the accelerometry + annotations in order to compare/align the these with the provided visual features + labels and investigate where the frames were dropped. Now that I found that frames are all dropped from the end, I am just using the provided visual features and labels alongside the processed accelerometer.

Does this answer your questions?

wanduoz commented 11 months ago

@julialromero Thank you so much. I never read tcn code before( I directly read mstcn code because of deep learning framework...). I'll spend some time with this code.

clearlove7-star commented 3 months ago

@julialromero Excuse me, may I ask how you used the accelerometer data for prediction after processing it, or what method you used to extract the accelerometer features.

yabufarha / ms-tcn

len of npy is different from len of video #44