Open liziming5353 opened 7 months ago
We just select several frames, feed them into LLM decoder without merge algorithm
Got it. What's the difference between video_path and fragment_video_path? In my understanding, video_path is the path to the video to be processed. But in upload_video_without_audio function in chat_model.py, fragment_video_path is used as a parameter of load_video function.
fragment_video_path stores the video clips read by the sliding window
So need I prepare the video clips in advance or it will be generated automatically?
no needs, it will be generated automatically
Where does it generated? I don't find it. The first time fragment_video_path is used seems to be as a parameter of load_video in upload_video_without_audio function.
you can run it and print the path to see:)
I have run it. I set the fragment_video_path to "~/video_frames_moviechat" which is an empty folder. An error occurred:
because fragment_video_path needs to be a mp4 file, not a dictionary:)
So fragment_video_path and video_path are the same video?
no, fragment_video_path is a tmp mp4 file
But I only have one video to be processed and you said that fragment_video_path will be generated. So I am confused... Could you give me a sample?
It seems a bug in pypi code. In github code, the capture_video function write the tmp video file, and return the path. But in pypi code, the capture_video function does not write the tmp video file but still return the path. So the error above occurred.
How is the model without the MM module implemented in the ablation experiment? Is it directly applying the merge algorithm to the entire video?