zsyzzsoft / co-mod-gan

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
444 stars 67 forks source link

Getting IndexError while Running run_metrics.py with Places2 #43

Open hamzapehlivan opened 2 years ago

hamzapehlivan commented 2 years ago


Thanks for sharing this great work!

I have a question about running run_metrics.py with Places2 dataset.

As it is indicated in Datasets section, I downloaded the validation set of Places2, and converted into TFRecords with --shuffle --compressed flags.

However, when I try to run `run_metrics.py' (I set metrics=idk36k5), I got the following:

dnnlib: Running run_metrics.run() on localhost... Evaluating metrics "ids36k5" for "models/co-mod-gan-places2-050000.pkl"... Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done. truncation=None Traceback (most recent call last): File "run_metrics.py", line 80, in main() File "run_metrics.py", line 75, in main dnnlib.submit_run(sc, 'run_metrics.run', kwargs) File "/workspace/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/workspace/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/workspace/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(submit_config.run_func_kwargs) File "/workspace/run_metrics.py", line 30, in run num_gpus=num_gpus, num_repeats=num_repeats, resume_with_new_nets=resume_with_new_nets, truncations=truncations) File "/workspace/metrics/metric_base.py", line 188, in run metric.run(*args, kwargs) File "/workspace/metrics/metric_base.py", line 82, in run self._evaluate(Gs, Gs_kwargs=Gs_kwargs, num_gpus=num_gpus) File "/workspace/metrics/inception_discriminative_score.py", line 35, in _evaluate self._configure(self.minibatch_per_gpu, hole_range=self.hole_range) File "/workspace/metrics/metric_base.py", line 168, in _configure return self._get_dataset_obj().configure(minibatch_size, hole_range=hole_range) File "/workspace/metrics/metric_base.py", line 153, in _get_dataset_obj self._dataset_obj = dataset.load_dataset(data_dir=self._data_dir, self._dataset_args) File "/workspace/training/dataset.py", line 250, in load_dataset dataset = dnnlib.util.get_obj_by_name(class_name)(**kwargs) File "/workspace/training/dataset.py", line 87, in init self.resolution = resolution if resolution is not None else max_shape[1] IndexError: list index (1) out of range

When I debug the code, I realized that tfr_shapes=[[74989]] in training/dataset.py line 82. Here is the "features" dictionary without bytes_list:

{'num_val_images': int64_list { value: 36500 } , 'shape': int64_list { value: 74989 } , 'compressed': int64_list { value: 1 } }

I am using the provided Docker image.

zsyzzsoft commented 2 years ago

Oh, you need to specify '--resolution=512'

hamzapehlivan commented 2 years ago

Thanks for the reply, it now worked!

However, I was not able to replicate the results of the paper. In the paper, the followings were reported for Places2 dataset.

P-IDS: 13.3 ±0.1, U-IDS: 27.4 ±0.1, FID: 7.9 ±0.0

What I got: P-IDS 11.6, U-IDS 25.9 , FID 8.0788

Running it several times also led to the same slightly inaccurate results.

I have the same issue with FFHQ Dataset, too.

One more point is that I got very similar results to paper when I specified the masked ratio between 0 and 0.2

Any idea why this happens?

zsyzzsoft commented 2 years ago

Quite strange. What is your result for FFHQ?

hamzapehlivan commented 2 years ago

For the metric ids10k-h0, I am getting:

P-IDS: 30.0, U-IDS 44.5, FID: 0.5483

For the metric ids10k, I am getting: P-IDS:14.8, U-IDS: 28.2, FID: 3.8678

zsyzzsoft commented 2 years ago

What is your numpy and tf version?

hamzapehlivan commented 2 years ago

My TF version is "1.15.0", and my Numpy version is "1.17.3".

I am using the provided Dockerfile. However, metric evaluation was not working, therefore, I changed it a bit:

FROM tensorflow/tensorflow:1.15.0-gpu-py3 RUN pip install scipy==1.3.3 \ requests==2.22.0 \ Pillow==6.2.1 \ tqdm \ imageio \ scikit-learn

If it helps, here is my log.txt:

dnnlib: Running run_metrics.run() on localhost... Evaluating metrics "ids10k" for "models/co-mod-gan-ffhq-9-025000.pkl"... Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done. truncation=None 100%|##########| 1250/1250 [08:05<00:00, 2.57it/s] co-mod-gan-ffhq-9-025000 time 9m 42s ids10k-FID 3.8398 ids10k-U 0.2777 ids10k-P 0.1442 dnnlib: Finished run_metrics.run() in 9m 56s.

zsyzzsoft commented 2 years ago

I still have no idea what causes this error... One hypothesis is that the randomness of the mask generator's behavior has changed, which may slightly bias the difficulty of the task