Closed hvishal512 closed 2 years ago
This is a hint that some of your parameters have not been adjusted properly
did you figure this out? my pycrop folder is also empty and the offset.txt file also doesn't get created
Hi @sunotsue, as @hrzisme suggested, setting the right parameters results in the pycrop folder having an output. If I remember correctly, I think it was a parameter that controls the length of the cropped video clips. Try to set it higher (5s+ or whichever works for you).
How to solve it?
my pycrop folder is also empty and the offset.txt file also doesn't get created
parser.add_argument('--data_dir', type=str, default='data/work', help='Output direcotry'); parser.add_argument('--videofile', type=str, default='', help='Input video file'); parser.add_argument('--reference', type=str, default='', help='Video reference'); parser.add_argument('--facedet_scale', type=float, default=0.25, help='Scale factor for face detection'); parser.add_argument('--crop_scale', type=float, default=0.40, help='Scale bounding box'); parser.add_argument('--min_track', type=int, default=100, help='Minimum facetrack duration'); parser.add_argument('--frame_rate', type=int, default=25, help='Frame rate'); parser.add_argument('--num_failed_det', type=int, default=25, help='Number of missed detections allowed before tracking is stopped'); parser.add_argument('--min_face_size', type=int, default=100, help='Minimum face size in pixels'); There was not a parameter that controls the length of the cropped video clips. My video did not take longer than 5 seconds, and this issue also occurred.So it may not be related to the length of the video.How to solve it?
Hi, @joonson
Thanks for open sourcing this amazing work. I'm able to test the pre-trained SyncNet model on a single speaker, single-shot video. However, when there are 2 or more speakers including multiple scenes, and when run_pipeline.py is used, the frames are extracted into the REFERENCE folder, but pycrop is empty. The pycrop folder being empty is probably the reason for the syncnet model being uploaded, but not resulting in any output when run_syncnet.py is deployed. I came across an issue opened in the GitHub repo regarding multiple speaker detection and it was clarified there that it indeed works for multi-speaker frames. But when I run_pipeline.py on my video, it is not able to detect multiple speakers and keep track of them across multiple scenes (it recognizes this, pycrop is empty). Can you please share some insight on what I might do to fix this? First of all is it possible to predict the AV offset using SyncNet in such a scenario where the videos are movie-like. Thank you.