mboudiaf / pytorch-meta-dataset

A non-official 100% PyTorch implementation of META-DATASET benchmark for few-shot classification
59 stars 9 forks source link

Sampling from episodic loader gives error - "Key image doesn't exist (select from [])!" #19

Open patricks-lab opened 1 year ago

patricks-lab commented 1 year ago

When sampling from the episodic loader, all usually goes fine until I get the following error:

Traceback (most recent call last):
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/pipeline.py", line 145, in get_next
    sample_dic = next(self.class_datasets[class_id])
TypeError: 'TFRecordDataset' object is not an iterator
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/pipeline.py", line 219, in get_next
    dataset = next(self.dataset_list[source_id])
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/pipeline.py", line 121, in __iter__
    sample_dic = self.get_next(class_id)
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/pipeline.py", line 148, in get_next
    sample_dic = next(self.class_datasets[class_id])
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/utils.py", line 23, in cycle_
    yield next(iterator)
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/tfrecord/reader.py", line 222, in example_loader
    feature_dic = extract_feature_dict(example.features, description, typename_mapping)
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/tfrecord/reader.py", line 162, in extract_feature_dict
    raise KeyError(f"Key {key} doesn't exist (select from {all_keys})!")
KeyError: "Key image doesn't exist (select from [])!"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/pipeline.py", line 201, in __iter__
    next_e = self.get_next(rand_source)
  File "/home/patrick/pytorch-meta-dataset/pytorch_meta_dataset/pipeline.py", line 222, in get_next
    dataset = next(self.dataset_list[source_id])
StopIteration

Just for info - I used an older version of your repo (https://github.com/mboudiaf/pytorch-meta-dataset/tree/c6d6922003380342ab2e3509425d96307aa925c5). I am sampling from the episodic loader. I use

episodic_dataset = pipeline.make_episode_pipeline(dataset_spec_list=all_dataset_specs,
                                                      split=split,
                                                      data_config=data_config,
                                                      episode_descr_config=episod_config)
episodic_loader = DataLoader(dataset=episodic_dataset,
                                 batch_size=meta_batch_size,
                                 num_workers=data_config.num_workers,
                                 worker_init_fn=seeded_worker_fn)
#Sample a batch of size [B, N*K, C, H, W] from episodic loader via next(iter(episodic_loader))
#where B = meta_batch_size, N*K = n_ways*k_shots, C = channels, H = height of image, W = width of image

Do you know what may be causing the KeyError: "Key image doesn't exist (select from [])!" StopIteration error? For the above error, I am setting 5-way 15-shots for train and 5-way 5-shot for test/validation, and meta_batch_size 2 for train and 4 for test/val.

EDIT: sometimes I'm also sampling from the episodic loader and encounter an infinite loop.

Thanks a lot in advance!

patricks-lab commented 1 year ago

Recently, I also found out that the error seems to originate from the utils.py (https://github.com/mboudiaf/pytorch-meta-dataset/blob/master/src/datasets/utils.py) file - sometimes, my program would hang after going into an infinite loop in cycle_ and eventually crashes with the above error.

So basically this is what happens:

First, in https://github.com/mboudiaf/pytorch-meta-dataset/blob/master/src/datasets/pipeline.py the code attempts to fetch a sample from a class, using sample_dic = next(self.class_datasets[class_id]). But then this raises a TypeError: 'TFRecordDataset' object is not an iterator exception, so it attempts to call cycle_() on the class in the except clause.

def get_next(self, class_id):
        try:
            sample_dic = next(self.class_datasets[class_id])
        except (StopIteration, TypeError) as e:
            self.class_datasets[class_id] = cycle_(self.class_datasets[class_id])
            sample_dic = next(self.class_datasets[class_id])
        return sample_dic

But, in the function cycle_ in utils.py the code tries to yield the next sample in the iterator, and if not, it will attempt to reset the iterator. So the code looked like this:

def cycle_(iterable):
    # Creating custom cycle since itertools.cycle attempts to save all outputs in order to
    # re-cycle through them, creating amazing memory leak
    iterator = iter(iterable)
    while True:
        try:
            yield next(iterator)
        except StopIteration:
            iterator = iter(iterable)

The issue I think is happening is that there is some iteration where the class iterable is empty (something like iter([])). I'm wondering if this may be due to a deformed/corrupted tfrecord (since I did specify a fixed 5-way, 10-shot task)? I'm thinking the class is empty since it's telling me that KeyError: "Key image doesn't exist (select from [])!", suggesting that the class we're trying to iterate from might be entirely empty.

As such I am wondering if I need to reinstall the files for meta-dataset? Or is there an issue with the code I'm running?

brando90 commented 1 year ago

@mboudiaf I'm also interested in this