Could the authors please share a detailed notebook on the data preparation process?
After carefully re-running the code, I noticed that the processed raw data consists of video frames. This means the input is not the video itself but rather individual images (or frames). The steps for converting the video into frames are missing, and there is no information on where to find the original video. It would be very helpful if the authors could provide more details on this. Many thanks.
Could the authors please share a detailed notebook on the data preparation process?
After carefully re-running the code, I noticed that the processed raw data consists of video frames. This means the input is not the video itself but rather individual images (or frames). The steps for converting the video into frames are missing, and there is no information on where to find the original video. It would be very helpful if the authors could provide more details on this. Many thanks.