'No videos found in the specified archive' error in Evaluating metrics

skymanaditya1 commented 2 years ago

I get the following IOError -- 'No videos found in the specified archive' when the metric evaluation step is called. The 0th tick runs just fine, after which the error is thrown.

It's coming at result_dict = metric_main.calc_metric( line in training_loop.py. From what I have seen, it is failing to initialize the VideoFramesFolderDataset properly, which is throwing the error - 'No videos found in the specified archive'.

Here is the stack trace --

Evaluating metrics for how2sign_faces_styleganv_resized_stylegan-v_random3_max32_how2sign_exp_styleganv_resized-e2f9580 ... {"results": {"fvd2048_16f": 5903.660982385764}, "metric": "fvd2048_16f", "total_time": 51.93783378601074, "total_time_str": "52s", "num_gpus": 4, "snapshot_pkl": "network-snapshot-000000.pkl", "timestamp": 1652628736.9207375} Traceback (most recent call last): File "/ssd_scratch/cvit/ravi/stylegan-v/experiments/how2sign_faces_styleganv_resized_stylegan-v_random3_max32_how2sign_exp_styleganv_resized-e2f9580/src/train.py", line 451, in main() # pylint: disable=no-value-for-parameter File "/ssd_scratch/cvit/ravi/stylegan-v/experiments/how2sign_faces_styleganv_resized_stylegan-v_random3_max32_how2sign_exp_styleganv_resized-e2f9580/src/train.py", line 446, in main torch.multiprocessing.spawn(fn=subprocess_fn, args=(args, temp_dir), nprocs=args.num_gpus) File "/ssd_scratch/cvit/ravi/stylegan-v/env/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/ssd_scratch/cvit/ravi/stylegan-v/env/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/ssd_scratch/cvit/ravi/stylegan-v/env/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 2 terminated with the following error: Traceback (most recent call last): File "/ssd_scratch/cvit/ravi/stylegan-v/env/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/ssd_scratch/cvit/ravi/stylegan-v/experiments/how2sign_faces_styleganv_resized_stylegan-v_random3_max32_how2sign_exp_styleganv_resized-e2f9580/src/train.py", line 375, in subprocess_fn training_loop.training_loop(rank=rank, args) File "/ssd_scratch/cvit/ravi/stylegan-v/experiments/how2sign_faces_styleganv_resized_stylegan-v_random3_max32_how2sign_exp_styleganv_resized-e2f9580/src/training/training_loop.py", line 508, in training_loop result_dict = metric_main.calc_metric( File "/home2/ravi_mishra/nps/stylegan-v/src/metrics/metric_main.py", line 49, in calc_metric all_runs_results = [_metricdictmetric for in range(num_runs)] File "/home2/ravi_mishra/nps/stylegan-v/src/metrics/metric_main.py", line 49, in all_runs_results = [_metricdictmetric for in range(num_runs)] File "/home2/ravi_mishra/nps/stylegan-v/src/metrics/metric_main.py", line 123, in fvd2048_128f fvd = frechet_video_distance.compute_fvd(opts, max_real=2048, num_gen=2048, num_frames=128) File "/home2/ravi_mishra/nps/stylegan-v/src/metrics/frechet_video_distance.py", line 31, in compute_fvd mu_real, sigma_real = metric_utils.compute_feature_stats_for_dataset( File "/ssd_scratch/cvit/ravi/stylegan-v/env/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context return func(args, kwargs) File "/home2/ravi_mishra/nps/stylegan-v/src/metrics/metric_utils.py", line 195, in compute_feature_stats_for_dataset dataset = dnnlib.util.construct_class_by_name(dataset_kwargs) File "/home2/ravi_mishra/nps/stylegan-v/src/dnnlib/util.py", line 292, in construct_class_by_name return call_func_by_name(args, func_name=class_name, kwargs) File "/home2/ravi_mishra/nps/stylegan-v/src/dnnlib/util.py", line 287, in call_func_by_name return func_obj(*args, **kwargs) File "/ssd_scratch/cvit/ravi/stylegan-v/experiments/how2sign_faces_styleganv_resized_stylegan-v_random3_max32_how2sign_exp_styleganv_resized-e2f9580/src/training/dataset.py", line 329, in init raise IOError('No videos found in the specified archive') OSError: No videos found in the specified archive

For now, I have commented the offending lines in training_loop.py and it has started to train. I commented these lines --

# if (snapshot_data is not None) and (len(metrics) > 0):
#     if rank == 0:
#         print(f'Evaluating metrics for {experiment_name} ...')
#     for metric in metrics:
#         result_dict = metric_main.calc_metric(
#             metric=metric,
#             G=snapshot_data['G_ema'],
#             dataset_kwargs=training_set_kwargs,
#             num_gpus=num_gpus,
#             rank=rank,
#             device=device)
#         if rank == 0:
#             metric_main.report_metric(result_dict, run_dir=run_dir, snapshot_pkl=snapshot_pkl)
#         stats_metrics.update(result_dict.results)
# del snapshot_data # conserve memory

Is this expected or am I doing something wrong?

universome commented 2 years ago

Hi! This is not expected and is very strange. Could you tell how does your dataset structure look like? Are you running from a zip archive or a directory? And could you tell which git hash are you currently using (today, I've updated it to e2f9580)

universome commented 2 years ago

Ah, ok, I think that I understand the issue. It looks like all your videos are too short (less than 128 frames) and that's why the dataset class discards them when computing fvd2048_128f, which is FVD computed on top 128-frames-long videos. What you should is not using fvd2048_128f and fvd2048_128f_subsample8f metrics. You can do this by removing them from the list here.

skymanaditya1 commented 2 years ago

Oh yes, that was so silly of me! I had been suspecting that I didn't have enough frames in my videos (which I wrote on the other thread as well). Thank you for the quick reply and resolution. Marking the issue closed.

universome / stylegan-v

'No videos found in the specified archive' error in Evaluating metrics #14